(error also happens in other contexts, for example:
When writing data towards server FTPS, target file is randomly truncated. No traceback, truncation is not detected by client.
import io
import logging
import json
logging.basicConfig(level=logging.DEBUG)
import ftplib
from ftputil import FTPHost
import ftputil.session
session_factory = ftputil.session.session_factory(base_class=ftplib.FTP_TLS, port=60000)
file_name = "big_file.json"
with open(file_name) as f:
data = json.load(f)
with FTPHost(
"localhost",
"user",
"12345",
session_factory=session_factory) as host:
with host.open(file_name, "w") as target_f:
host.copyfileobj(io.StringIO(json.dumps(data, indent=4)), target_f)
# host.stat(file_name)
"""
An RFC-4217 asynchronous FTPS server supporting both SSL and TLS.
Requires PyOpenSSL module (http://pypi.python.org/pypi/pyOpenSSL).
"""
import logging
logging.basicConfig(level=logging.DEBUG)
from pyftpdlib.servers import FTPServer
from pyftpdlib.authorizers import DummyAuthorizer
from pyftpdlib.handlers import TLS_FTPHandler
def main():
authorizer = DummyAuthorizer()
authorizer.add_user('user', '12345', './ftp_files', perm='elradfmwMT')
authorizer.add_anonymous('.')
handler = TLS_FTPHandler
handler.certfile = "./ssl/ssl.chain"
handler.keyfile = "./ssl/ssl.key"
handler.authorizer = authorizer
handler.masquerade_address = '0.0.0.0'
handler.passive_ports = range(60001, 65535)
# requires SSL for both control and data channel
handler.tls_control_required = True
handler.tls_data_required = True
server = FTPServer(("", 60000), handler)
server.serve_forever()
if __name__ == '__main__':
main()
Thanks a lot for the thorough report and for trying so many things to close in on the problem! :-)
This bug is probably going to be interesting, but let's see. ... ;-)
I have a few questions:
- Were you able to experiment with another FTP server?
- At least some of the "ideas" look as if they're actual observations, though "ideas" conveys that these items are "only" suspicions. Are all of these points observations? Are some of them suspicions instead of observations, and if yes, which ones?
- When you say the truncation size is random, do you see truncations only on sizes that are divisible by a certain value? For example, is the size is always a multiple of the buffer size of
copyfileobj
?I hope to find the time to look further into this ticket on the weekend.
One more question: When you say, "truncation size is random", does this refer to the very same file? Or do you mean you used different input/data and can't recognize a pattern between the original size of the data and the size you're seeing in the written remote data?
Hello, thanks for your comments !
Sorry for my late reply, is there a way to be notified by mail of the activity of this ticket ?
To answer your questions :
- Were you able to experiment with another FTP server? : no
- At least some of the "ideas" look as if they're actual observations, though "ideas" conveys that these items are "only" suspicions. Are all of these points observations? Are some of them suspicions instead of observations, and if yes, which ones? : The only thing that is suspicious is FTPS : the problem never occurs using FTP. The rest of the points are observations, I shouldn't have called them ideas. I wouldn't say they are suspicious, but they make me feel that ftps connection may be closed before write is finished, under certain circumstances, depending on transfer rates and file size. However my knowledge on both FTP protocol and TLS is insufficient, so I'm realy not sure it is a good direction to investigate.
- When you say the truncation size is random, do you see truncations only on sizes that are divisible by a certain value? For example, is the size is always a multiple of the buffer size of copyfileobj? I haven't payed attention and did not keep a copy of all truncated files. One of them is 435 KB for example.
- When you say, "truncation size is random", does this refer to the very same file? Or do you mean you used different input/data and can't recognize a pattern between the original size of the data and the size you're seeing in the written remote data? : I mean on the same file : same operation on same file leads to random truncations.
Sorry for my late reply, is there a way to be notified by mail of the activity of this ticket ?
You can put yourself into the cc field when you modify the ticket, but I guess this will be readable by spammers, too. :-/
It's planned to move the ftputil infrastructure to Sourcehut sometime during the next few months. I expect the ticket situation will get better then.
Thanks for your answers. I hope I can look into this on the weekend. Probably it makes most sense if I try to reproduce the problem with your pyftpdlib code before trying to reproduce it with my local FTP/FTPS server.
One more question though: Do you see truncation if you just write a large string with
remote_file.write
?You wrote something about that, but I'm not sure if you tried to leave out the JSON dump and the use of
StringIO
. I ask because I think we should make sure that no buffering effects ofjson.dumps
orStringIO
play a role here. (I doubt there are any, but still I suggest we should try to reproduce the problem with as few moving parts as possible.)Another question: Did you leave the FTP server running for a bit after closing the ftputil remote file? Maybe the server buffers some data and writes it only after a while. Can you try letting the server idle ten seconds or so after the ftputil remote file is closed?
1. using remote_file.write
The same problem occurs.
The string does not need to be very big.
This time I checked the size and here is what I get :
source file: 489 KB
- 1st write, target file: 208 KB
- 2nd write, target file: 224 KB
- 3rd, 4th, 5th write, target file: 208 KB
- 6th, 7th write: 176 KB
- 8th write: 192 KB
...
2. leaving the server open
During all my tests, the server stays open a lot more than 10 seconds after running the test.
I can reproduce the problem with the following client script:
import io import logging logging.basicConfig(level=logging.DEBUG) import ftplib from ftputil import FTPHost import ftputil.session session_factory = ftputil.session.session_factory(base_class=ftplib.FTP_TLS, port=60000) file_name = "my_data" data = 489 * 1024 * "x" data = data.encode('ascii') print("=== data:", len(data)) # with open(file_name, 'wb') as fobj: # fobj.write(data) with FTPHost( "localhost", "user", "12345", session_factory=session_factory) as host: with host.open(file_name, "wb") as target_f: host.copyfileobj(io.BytesIO(data), target_f)
I see interesting messages in the server log (printed to the screen):
DEBUG:pyftpdlib:127.0.0.1:58224-[user] <- STOR my_data DEBUG:pyftpdlib:127.0.0.1:58224-[user] -> 125 Data connection already open. Transfer starting. DEBUG:pyftpdlib:[debug] call: _do_ssl_handshake, err: ssl-want-read (<TLS_DTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=0)>) DEBUG:pyftpdlib:[debug] SSL connection established (<TLS_DTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=0)>) DEBUG:pyftpdlib:[debug] call: recv(), err: SysCallError(104, 'ECONNRESET') (<TLS_DTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=229376)>) DEBUG:pyftpdlib:[debug] call: _do_ssl_shutdown() -> os.write, err: BrokenPipeError(32, 'Broken pipe') (<TLS_DTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=229376)>) DEBUG:pyftpdlib:[debug] call: close() (<TLS_DTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=229376)>) DEBUG:pyftpdlib:127.0.0.1:58224-[user] -> 226 Transfer complete. INFO:pyftpdlib:127.0.0.1:58224-[user] STOR /.../ftp_files/my_data completed=1 bytes=229376 seconds=0.003 DEBUG:pyftpdlib:[debug] call: recv(), err: SysCallError(-1, 'Unexpected EOF') (<TLS_FTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user')>) DEBUG:pyftpdlib:[debug] call: _do_ssl_shutdown(), shutdown completed (<TLS_FTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user')>) DEBUG:pyftpdlib:[debug] call: close() (<TLS_FTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user')>) INFO:pyftpdlib:127.0.0.1:58224-[user] FTP session closed (disconnect). DEBUG:pyftpdlib:[debug] call: recv(), err: SysCallError(-1, 'Unexpected EOF') (<TLS_FTPHandler(id=140404957278752, addr='127.0.0.1:58222', ssl=True, user='user')>) DEBUG:pyftpdlib:[debug] call: _do_ssl_shutdown(), shutdown completed (<TLS_FTPHandler(id=140404957278752, addr='127.0.0.1:58222', ssl=True, user='user')>) DEBUG:pyftpdlib:[debug] call: close() (<TLS_FTPHandler(id=140404957278752, addr='127.0.0.1:58222', ssl=True, user='user')>) INFO:pyftpdlib:127.0.0.1:58222-[user] FTP session closed (disconnect).
Note the
BrokenPipeError
and the most likely relatedSysCallError(-1, 'Unexpected EOF')
. As in your case, the number of transferred bytes, here 229376, varies between runs of the client script.I'll try to find out more.
With a
time.sleep(1)
at the end of the innerwith
block, i.e.with host.open(file_name, "wb") as target_f: host.copyfileobj(io.BytesIO(data), target_f, max_chunk_size=sys.maxsize) time.sleep(1)
the file is transferred completely, but the server output still reports a broken pipe error:
DEBUG:pyftpdlib:127.0.0.1:58310-[user] <- STOR my_data DEBUG:pyftpdlib:127.0.0.1:58310-[user] -> 125 Data connection already open. Transfer starting. DEBUG:pyftpdlib:[debug] call: _do_ssl_handshake, err: ssl-want-read (<TLS_DTPHandler(id=140407016892112, addr='127.0.0.1:58310', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=0)>) DEBUG:pyftpdlib:[debug] SSL connection established (<TLS_DTPHandler(id=140407016892112, addr='127.0.0.1:58310', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=0)>) DEBUG:pyftpdlib:[debug] call: recv(), err: SysCallError(104, 'ECONNRESET') (<TLS_DTPHandler(id=140407016892112, addr='127.0.0.1:58310', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=500736)>) DEBUG:pyftpdlib:[debug] call: _do_ssl_shutdown() -> os.write, err: BrokenPipeError(32, 'Broken pipe') (<TLS_DTPHandler(id=140407016892112, addr='127.0.0.1:58310', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=500736)>) DEBUG:pyftpdlib:[debug] call: close() (<TLS_DTPHandler(id=140407016892112, addr='127.0.0.1:58310', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=500736)>) DEBUG:pyftpdlib:127.0.0.1:58310-[user] -> 226 Transfer complete. INFO:pyftpdlib:127.0.0.1:58310-[user] STOR /.../ftp_files/my_data completed=1 bytes=500736 seconds=1.005
For the record, a
time.sleep
after the innerwith
block, but inside the outerwith
block, doesn't seem to help though.I have a suspicion what's going on, but need to look into this more.
I think I fixed the problem. Would you please clone/pull the latest change from https://git.sr.ht/~sschwarzer/ftputil and verify whether the change also works for you and report back?
Commit message:
Unwrap `FTPFile` TLS socket before closing it Without the unwrapping, there may be data "stuck" in the TLS wrapper, so the underlying socket isn't "flushed" when closing the socket with `self._conn.close()`. This problem was raised in ticket <a href="/~sschwarzer/ftputil/149" title="~sschwarzer/ftputil#149: remote files randomly truncated">#149</a>. Probably it wasn't noticed earlier, since waiting a short time (one second or even less; where "waiting" could mean doing anything else that takes a moment) between the last write to the socket and closing the `FTPFile` seems to transmit all data. That said, pyftpdlib still reports a broken pipe error in this case. I found the solution when checking the current ftplib source code for `FTP.storbinary`. With this commit, the workaround of waiting before closing the `FTPFile` is no longer needed and the broken pipe error in the pyftpdlib log is gone. ticket: 149
Hello, I confirm it works for me !
Well done and thanks for your reactivity.
And by the way, congratulations and thanks for all your work on ftputil !
I'm glad to hear the change works for you! :-) I'll probably release a new ftputil version on Wednesday.
Well done and thanks for your reactivity. And by the way, congratulations and thanks for all your work on ftputil !
Thank you, such compliments are always nice to get. :-)