When running the ftputil unit tests with Python 3.5.3 on Fedora 25, some of the tests fail from time to time. These are
test/test_real_ftp.py::TestFTPFiles::test_only_closed_children FAILED
test/test_real_ftp.py::TestFTPFiles::test_no_timed_out_children FAILED
test/test_real_ftp.py::TestFTPFiles::test_no_delayed_226_children FAILED
...
test/test_real_ftp.py::TestOther::test_subsequent_reading FAILED
(output from Pytest). Again, occasionally some of the tests pass.
However, when returning now to ftputil after some months, I haven't seen
all of the four tests pass. I also saw the failures with Python 3.3.6,
compiled from source, although readinto
raises an IOError
then
instead of an OSError
(https://docs.python.org/3/whatsnew/3.5.html?highlight=readinto#io
may be related here).
I do not see any of the above test methods fail for Python 2.7.13.
When the failures happen, the traceback reads like this:
___________________________________________________ TestOther.test_subsequent_reading ___________________________________________________
self = <test.test_real_ftp.TestOther object at 0x7fc0a7e20be0>
def test_subsequent_reading(self):
# Open a file for reading.
with self.host.open("debian-keyring.tar.gz", "rb") as file1:
pass
# Make sure that there are no problems if the connection is reused.
> with self.host.open("debian-keyring.tar.gz", "rb") as file2:
test/test_real_ftp.py:880:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ftputil/host.py:203: in open
host = self._available_child()
ftputil/host.py:167: in _available_child
host._session.pwd()
/usr/lib64/python3.5/ftplib.py:655: in pwd
resp = self.voidcmd('PWD')
/usr/lib64/python3.5/ftplib.py:277: in voidcmd
return self.voidresp()
/usr/lib64/python3.5/ftplib.py:250: in voidresp
resp = self.getresp()
/usr/lib64/python3.5/ftplib.py:235: in getresp
resp = self.getmultiline()
/usr/lib64/python3.5/ftplib.py:221: in getmultiline
line = self.getline()
/usr/lib64/python3.5/ftplib.py:203: in getline
line = self.file.readline(self.maxline + 1)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <socket.SocketIO object at 0x7fc0a93a7da0>, b = <memory at 0x7fc0a96271c8>
def readinto(self, b):
"""Read up to len(b) bytes into the writable buffer *b* and return
the number of bytes read. If the socket is non-blocking and no bytes
are available, None is returned.
If *b* is non-empty, a 0 return value indicates that the connection
was shutdown at the other end.
"""
self._checkClosed()
self._checkReadable()
if self._timeout_occurred:
> raise OSError("cannot read from timed out object")
E OSError: cannot read from timed out object
/usr/lib64/python3.5/socket.py:573: OSError
A possibly interesting observation is that all of the above tests seem
to take a few seconds to run although they open the file
debian-keyring.tar.gz
only for reading. This file is about 14 MB
large.
The OSError
exception from Python's I/O layer in my opinion is
different from a timeout where the server sends a 4xx (temporary
failure) reply to the client.
Things to investigate:
Are there changes when we use a small file instead of
debian-keyring.tar.gz
?Why do the tests run so long (several seconds)?
Why don't the failures occur for Python 2.7.13 ?
Replying to schwa:
- Are there changes when we use a small file instead of
debian-keyring.tar.gz
?When using a small file (
CONTENTS
, 641 bytes) none of the four above tests fails within three test runs. Also, the pauses while running each test aren't noticable.
When adding
test_subsequent_reading
like this:def test_subsequent_reading(self): # Open a file for reading. import sys print("=== before first read", file=sys.stderr) with self.host.open("debian-keyring.tar.gz", "rb") as file1: pass # Make sure that there are no problems if the connection is reused. print("=== before second read", file=sys.stderr) with self.host.open("debian-keyring.tar.gz", "rb") as file2: pass print("=== after second read", file=sys.stderr) assert file1._session is file2._session
the output shows that the long pause happens between "=== before first read" and "=== before second read". The exception happens almost immediately after printing "=== before second read".
I print to
sys.stderr
to make sure that the output is unbuffered, i. e. appears as soon as the
I added another
test_subsequent_reading
:def test_subsequent_reading(self): # Open a file for reading. import sys print("=== before first read", file=sys.stderr) with self.host.open("debian-keyring.tar.gz", "rb") as file1: print("=== just opened", file=sys.stderr) pass # Make sure that there are no problems if the connection is reused. print("=== before second read", file=sys.stderr) with self.host.open("debian-keyring.tar.gz", "rb") as file2: pass print("=== after second read", file=sys.stderr) assert file1._session is file2._session
If the test fails, there's the long (several seconds) pause after "=== just opened", After printing "=== before second read" everything else happens quickly. Since "=== before second read" is printed, but "=== after second read" isn't, we can assume the exception happens in the
open
of the secondwith
statement. This can also be seen in the traceback above.If the test succeeds (which happens rarely), there's a long pause after "=== before sedond read", but curiously the execution up to his point is quick. We do not see the long pause between "=== just opened" and "=== before second read".
I tried to reproduce the bug in isolation from other tests in the attached
ticket112.py
. At first, the bug didn't seem to occur at all, but finally showed up in iteration 3660:Traceback (most recent call last): File "./private/ticket112.py", line 43, in <module> test_subsequent_reading() File "./private/ticket112.py", line 31, in test_subsequent_reading with host.open("debian-keyring.tar.gz", "rb") as file2: File "/home/schwa/sd/python/ftputil/ftputil/host.py", line 203, in open host = self._available_child() File "/home/schwa/sd/python/ftputil/ftputil/host.py", line 167, in _available_child host._session.pwd() File "/usr/lib64/python3.5/ftplib.py", line 655, in pwd resp = self.voidcmd('PWD') File "/usr/lib64/python3.5/ftplib.py", line 277, in voidcmd return self.voidresp() File "/usr/lib64/python3.5/ftplib.py", line 250, in voidresp resp = self.getresp() File "/usr/lib64/python3.5/ftplib.py", line 235, in getresp resp = self.getmultiline() File "/usr/lib64/python3.5/ftplib.py", line 221, in getmultiline line = self.getline() File "/usr/lib64/python3.5/ftplib.py", line 203, in getline line = self.file.readline(self.maxline + 1) File "/usr/lib64/python3.5/socket.py", line 573, in readinto raise OSError("cannot read from timed out object") OSError: cannot read from timed out object
The call chain looks the same as the previous one from running all unit tests.
Since in the unit tests a smaller file seemed to make the failure a lot less likely, I tried to reproduce the failure with a bigger file with exactly 512 MB. I got a traceback in iteration 2411:
Traceback (most recent call last): File "private/ticket112.py", line 43, in <module> test_subsequent_reading() File "private/ticket112.py", line 31, in test_subsequent_reading with host.open("debian-keyring.tar.gz", "rb") as file2: File "/home/schwa/sd/python/ftputil/ftputil/host.py", line 203, in open host = self._available_child() File "/home/schwa/sd/python/ftputil/ftputil/host.py", line 167, in _available_child host._session.pwd() File "/usr/lib64/python3.5/ftplib.py", line 655, in pwd resp = self.voidcmd('PWD') File "/usr/lib64/python3.5/ftplib.py", line 277, in voidcmd return self.voidresp() File "/usr/lib64/python3.5/ftplib.py", line 250, in voidresp resp = self.getresp() File "/usr/lib64/python3.5/ftplib.py", line 235, in getresp resp = self.getmultiline() File "/usr/lib64/python3.5/ftplib.py", line 221, in getmultiline line = self.getline() File "/usr/lib64/python3.5/ftplib.py", line 203, in getline line = self.file.readline(self.maxline + 1) File "/usr/lib64/python3.5/socket.py", line 573, in readinto raise OSError("cannot read from timed out object") OSError: cannot read from timed out object
The last log entries were:
2017-07-24 14:09:00,231:DEBUG:Iteration 2407 2017-07-24 14:09:00,246:DEBUG:=== before first read 2017-07-24 14:09:00,256:DEBUG:=== just opened 2017-07-24 14:09:00,257:DEBUG:=== before second read 2017-07-24 14:09:00,260:DEBUG:=== after second read 2017-07-24 14:09:00,260:DEBUG:Iteration 2408 2017-07-24 14:09:00,275:DEBUG:=== before first read 2017-07-24 14:09:00,294:DEBUG:=== just opened 2017-07-24 14:09:00,295:DEBUG:=== before second read 2017-07-24 14:09:05,303:DEBUG:=== after second read 2017-07-24 14:09:05,303:DEBUG:Iteration 2409 2017-07-24 14:09:05,315:DEBUG:=== before first read 2017-07-24 14:09:05,329:DEBUG:=== just opened 2017-07-24 14:09:05,330:DEBUG:=== before second read 2017-07-24 14:09:10,337:DEBUG:=== after second read 2017-07-24 14:09:10,337:DEBUG:Iteration 2410 2017-07-24 14:09:10,358:DEBUG:=== before first read 2017-07-24 14:09:10,371:DEBUG:=== just opened 2017-07-24 14:09:10,372:DEBUG:=== before second read 2017-07-24 14:09:15,376:DEBUG:=== after second read 2017-07-24 14:09:15,376:DEBUG:Iteration 2411 2017-07-24 14:09:15,392:DEBUG:=== before first read 2017-07-24 14:09:15,411:DEBUG:=== just opened 2017-07-24 14:09:20,417:DEBUG:=== before second read
I can now reproduce the problem much more reliably with the attached script
ticket112.2.py
(renamed fromticket112.py
by Trac).
I went through the function
test_only_closed_children
, most of the time stepping further into the code (withs
inipdb
). I didn't notice anything strange.After that, I used
n
to go throughtest_only_closed_children
for 9 calls. That worked without visible problems and everything I saw during the execution seemed as expected.Only after using
c
to run the problem regularly, I promptly got anOSError: cannot read from timed out object
again.
I added some sleep calls to
test_only_closed_children
:def test_only_closed_children(): import time with ftputil.FTPHost(*LOGIN_DATA) as host: REMOTE_FILE_NAME = "debian-keyring.tar.gz" log.debug("=== before first read") #time.sleep(0.1) with host.open(REMOTE_FILE_NAME, "rb") as file_obj1: time.sleep(0.1) # Create empty file and close it. log.debug("=== before second read") with host.open(REMOTE_FILE_NAME, "rb") as file_obj2: time.sleep(0.1) pass # This should re-use the second child because the first isn't # closed but the second is. log.debug("=== before third read") #time.sleep(0.1) with host.open(REMOTE_FILE_NAME, "rb") as file_obj: time.sleep(0.1) assert len(host._children) == 2 assert file_obj._host is host._children[1] log.debug("=== after third read")
Some
sleep
calls I started with were successively commented out. The above version of the code ran for 1000 iterations; after that I interrupted the program. The time between iterations was about 0.35 seconds. This makes sense because there are 3sleep
calls of 0.1 seconds each.
I also commented out the
sleep
call after the firsthost.open
call. After removing the commented-outsleep
calls, the function reads:def test_only_closed_children(): import time with ftputil.FTPHost(*LOGIN_DATA) as host: REMOTE_FILE_NAME = "debian-keyring.tar.gz" log.debug("=== before first read") with host.open(REMOTE_FILE_NAME, "rb") as file_obj1: # Create empty file and close it. log.debug("=== before second read") with host.open(REMOTE_FILE_NAME, "rb") as file_obj2: time.sleep(0.1) pass # This should re-use the second child because the first isn't # closed but the second is. log.debug("=== before third read") with host.open(REMOTE_FILE_NAME, "rb") as file_obj: time.sleep(0.1) assert len(host._children) == 2 assert file_obj._host is host._children[1] log.debug("=== after third read")
The program is still able to run for 1000 iterations (and more). The time between iterations is 0.25 seconds.
Commenting out the first remaining
sleep
call (after the secondopen
call) makes the program raise anOSError
as in the ticket description. Sometimes this happens in the very first iteration. The log manages to show "=== after third read", so we can presume that the exception happens inFTPHost._available_child
in the lastFTPHost.open
call.The outputs "=== Iteration 1", "=== before first read" and "=== before second read" happen shortly after one another. On the other hand, the time between "=== before second read" and "=== before third read" is about 5 seconds.
Restoring the first
sleep
call and commenting out the second makes the function run repeatedly again. The time between iterations is about 5 seconds. I stopped the execution after about 50 iterations.As it happens, 5 seconds is the timeout constant defined in
ftputil/file.py
as:class FTPFile(object): """ Represents a file-like object associated with an FTP host. File and socket are closed appropriately if the `close` method is called. """ # Set timeout in seconds when closing file connections (see ticket <a href="/~sschwarzer/ftputil/51" title="~sschwarzer/ftputil#51: _FTPFile.close() blocks after reading only a part of a file">#51</a>). _close_timeout = 5
and used in
close
in the same class:def close(self): """Close the `FTPFile`.""" if self.closed: return # Timeout value to restore, see below. # Statement works only before the try/finally statement, # otherwise Python raises an `UnboundLocalError`. old_timeout = self._session.sock.gettimeout() try: self._fobj.close() self._fobj = None with ftputil.error.ftplib_error_to_ftp_io_error: self._conn.close() # Set a timeout to prevent waiting until server timeout # if we have a server blocking here like in ticket <a href="/~sschwarzer/ftputil/51" title="~sschwarzer/ftputil#51: _FTPFile.close() blocks after reading only a part of a file">#51</a>. self._session.sock.settimeout(self._close_timeout) try: with ftputil.error.ftplib_error_to_ftp_io_error: self._session.voidresp() except ftputil.error.FTPIOError as exc: # Ignore some errors, see tickets <a href="/~sschwarzer/ftputil/51" title="~sschwarzer/ftputil#51: _FTPFile.close() blocks after reading only a part of a file">#51</a> and <a href="/~sschwarzer/ftputil/17" title="~sschwarzer/ftputil#17: FTPFile.close() may result in FTPIOException">#17</a> at # https://todo.sr.ht/~sschwarzer/ftputil/51 and # https://todo.sr.ht/~sschwarzer/ftputil/17, # respectively. exc = str(exc) error_code = exc[:3] if exc.splitlines()[0] != "timed out" and \ error_code not in ("150", "426", "450", "451"): raise finally: # Restore timeout for socket of `FTPFile`'s `ftplib.FTP` # object in case the connection is reused later. self._session.sock.settimeout(old_timeout) # If something went wrong before, the file is probably # defunct and subsequent calls to `close` won't help # either, so we consider the file closed for practical # purposes. self.closed = True
When I increase the timeout to, say, 8 seconds, the time between iterations grows to 8 seconds. So if we have only the first
sleep
call,FTPFile.close
runs into the timeout.
Attached a script
ticket112_reduced.py
that still reproduces the problem. The only difference is that whereas the original script showed theOSError
in the first iteration, the simplified script may take a few iterations to show theOSError
. On my system, the exception usually occurs in the second or third iteration.
As more debugging showed, the trigger of the bug is in the method
SocketIO.readinto
fromsocket.py
in Python 3 (here the Python 3.5.3 implementation):def readinto(self, b): """Read up to len(b) bytes into the writable buffer *b* and return the number of bytes read. If the socket is non-blocking and no bytes are available, None is returned. If *b* is non-empty, a 0 return value indicates that the connection was shutdown at the other end. """ self._checkClosed() self._checkReadable() if self._timeout_occurred: raise OSError("cannot read from timed out object") while True: try: return self._sock.recv_into(b) except timeout: self._timeout_occurred = True raise except error as e: if e.args[0] in _blocking_errnos: return None raise
When explaining the bug, I refer to the function
main
in the recently attachedticket112_reduced.py
.When the first remote file is opened and closed in
main
(lines 24 and 25),FTPFile.close
times out in the callself._session.voidresp
(in about the middle ofclose
):def close(self): """Close the `FTPFile`.""" if self.closed: return # Timeout value to restore, see below. # Statement works only before the try/finally statement, # otherwise Python raises an `UnboundLocalError`. old_timeout = self._session.sock.gettimeout() try: self._fobj.close() self._fobj = None with ftputil.error.ftplib_error_to_ftp_io_error: self._conn.close() # Set a timeout to prevent waiting until server timeout # if we have a server blocking here like in ticket <a href="/~sschwarzer/ftputil/51" title="~sschwarzer/ftputil#51: _FTPFile.close() blocks after reading only a part of a file">#51</a>. self._session.sock.settimeout(self._close_timeout) try: with ftputil.error.ftplib_error_to_ftp_io_error: self._session.voidresp() except ftputil.error.FTPIOError as exc: # Ignore some errors, see tickets <a href="/~sschwarzer/ftputil/51" title="~sschwarzer/ftputil#51: _FTPFile.close() blocks after reading only a part of a file">#51</a> and <a href="/~sschwarzer/ftputil/17" title="~sschwarzer/ftputil#17: FTPFile.close() may result in FTPIOException">#17</a> at # https://todo.sr.ht/~sschwarzer/ftputil/51 and # https://todo.sr.ht/~sschwarzer/ftputil/17, # respectively. exc = str(exc) error_code = exc[:3] if exc.splitlines()[0] != "timed out" and \ error_code not in ("150", "426", "450", "451"): raise finally: # Restore timeout for socket of `FTPFile`'s `ftplib.FTP` # object in case the connection is reused later. self._session.sock.settimeout(old_timeout) # If something went wrong before, the file is probably # defunct and subsequent calls to `close` won't help # either, so we consider the file closed for practical # purposes. self.closed = True
Apart from the
FTPIOError
that is raised by the timeout, theSocketIO.readinto
codetry: return self._sock.recv_into(b) except timeout: self._timeout_occurred = True raise
sets the instance variable
self._timeout_occured
toTrue
.When
main
executes the second remote access (lines 27 and 28), this executes the callhost._session.pwd
inFTPHost._available_child
. As already shown in previous comments, the lines inSocketIO.readinto
if self._timeout_occurred: raise OSError("cannot read from timed out object")
will now "pick up" the timeout that was remembered by the previous remote file access (specifically, the
FTPFile.close
call).This caused me a lot of confusion during debugging because the failure in the second
FTPHost.open
only happens because the timeout was set in the previousFTPFile.close
call. (In hindsight, this maybe isn't so surprising, but it took a bit to get there. ;-) )The problem doesn't happen in Python 2.7 because it uses a quite different implementation for socket file I/O and it seems Python 2.7 doesn't "remember" the timeout it ran into, so
FTPHost.open
has no problem reusing the previously timed-out FTP session (ftplib.FTP
instance).To fix the problem in ftputil, the obvious way would be to not only catch
ftplib.error_temp
andftplib.error_reply
inFTPHost._available_child
, but alsoOSError
. However, catching the lower-level exception here together with the higher-levelftplib.error_*
exceptions looks odd and I wonder if there isn't a cleaner solution.
For the record, the implementation of
SocketIO.readinto
in Python 3.1 wasdef readinto(self, b): """Read up to len(b) bytes into the writable buffer *b* and return the number of bytes read. If the socket is non-blocking and no bytes are available, None is returned. If *b* is non-empty, a 0 return value indicates that the connection was shutdown at the other end. """ self._checkClosed() self._checkReadable() return self._sock.recv_into(b)
The timeout handling was added in Python 3.2.
Replying to schwa:
This caused me a lot of confusion during debugging because the failure in the second
FTPHost.open
only happens because the timeout was set in the previousFTPFile.close
call. (In hindsight, this maybe isn't so surprising, but it took a bit to get there. ;-) )Debugging was difficult because
- The problem occurs only with specific timing. The use of a debugger can make the bug "disappear." Therefore, I had to use logging/print calls a lot more than I would have liked.
- Some of the execution goes on in C code that is hidden from the Python debugger.
- The I/O code uses quite a few inheritance relationships.
For a while, I thought
self._timeout_occured
may be set from C code in a base class ofSocketIO
.
I'm thinking of starting a discussion on this on python-dev.
The timeout handling in
SocketIO
is referenced in
The documentation for the ftplib module mentions
OSError
only inftplib.all_errors
The set of all exceptions (as a tuple) that methods of FTP instances may raise as a result of problems with the FTP connection (as opposed to programming errors made by the caller). This set includes the four exceptions listed above as well as OSError.
There's no mention of any exceptions for
FTP.pwd
.
An experiment in the interpreter confirms that a "normal" idle timeout still raises on
error_temp
, not anOSError
. Otherwise I'd been very surprised because raising anOSError
initially would probably break a lot of code that usesftplib
.The socket timeout that we see in this ticket isn't the idle timeout of the FTP level. After all, the exception happens because the FTP operations happen fast, not because they happen slowly. With a
time.sleep(0.1)
in the firstwith
block inmain
, theOSError
doesn't occur although from the time between "Iteration n" outputs there is a timeout in each iteration. I wonder how the timeout in this case is different from the timeout that leads to theOSError
.
Fixed in ad4a3fabca5ce074495192e8a7e82b65c4323da6.