For ticket
#96,
I was looking into the exact behavior of ftputil when encoding and
decoding file system paths for FTPHost.listdir
.
It turned out that under Python 2 ftputil doesn't work correctly with paths that contain non-ASCII characters. The intended behavior is:
Path argument is a byte string:
Characters are sent as they are, even if they are non-ASCII characters (i. e. character codes greater than 127).
Returned lines are byte strings (latin1-encoded, i. e. the data sent
by the server) if the listdir
argument was a byte string.
Path argument is a unicode string:
Unicode strings are assumed to be decoded from latin1. For example,
to get the contents of the directory b'\xc3\xa4bc'
(corresponding
to "äbc"), the unicode string must be
b'\xc3\xa4bc'.decode("latin1")
.
Returned lines are unicode strings (decoded assuming latin1 encoding).
This behavior is indeed seen when ftputil runs under Python 3. (By the
way, the handling of unicode strings is strange, but it's used to be
compatible with Python 3's ftplib.FTP
implementation. If
ftputil.FTPHost
would behave differently, it couldn't work with
non-ASCII paths that were created with ftplib.FTP
under Python 3.)
Under Python 2, pure ASCII strings work as above regardless of being
byte strings or unicode strings. However, when the strings contain
characters with codes greater than 127, FTPHost.listdir
raises a
UnicodeEncodeError
, again both for byte strings and unicode strings.
There's a unit test
test.test_host.TestAcceptEitherUnicodeOrBytes.test_listdir
:def test_listdir(self): """Test whether `listdir` accepts either unicode or bytes.""" host = self.host as_bytes = ftputil.tool.as_bytes host.chdir("/home/file_name_test") # Unicode items = host.listdir("ä") self.assertEqual(items, ["ö", "o"]) # Need explicit type check for Python 2 for item in items: self.assertTrue(isinstance(item, ftputil.compat.unicode_type)) # Bytes items = host.listdir(as_bytes("ä")) self.assertEqual(items, [as_bytes("ö"), as_bytes("o")]) # Need explicit type check for Python 2 for item in items: self.assertTrue(isinstance(item, ftputil.compat.bytes_type))
As far as I can tell this should catch the problematic behavior in the ticket description. I need to find out why the test doesn't uncover the problem.
The problem with the unit test is that it uses a mock session, that is, it doesn't use
ftplib.FTP
to access a real FTP server. Instead, the mockdir
method returns lines from a multi-line string. Here's the mock implementation offtplib.FTP.dir
:def dir(self, *args): """ Provide a callback function for processing each line of a directory listing. Return nothing. """ # The callback comes last in `ftplib.FTP.dir`. if isinstance(args[-1], collections.Callable): # Get `args[-1]` _before_ removing it in the line after. callback = args[-1] args = args[:-1] else: callback = None # Everything before the path argument are options. path = args[-1] if DEBUG: print("dir: {0}".format(path)) path = self._transform_path(path) if path not in self.dir_contents: raise ftplib.error_perm dir_lines = self.dir_contents[path].split("\n") for line in dir_lines: if callback is None: print(line) else: callback(line)
Only tests in
test/test_real_ftp.py
actually talk to an FTP server. I'm going to add a test there.
The problem should be fixed now.
Add a session factory adapter in [4ed85584a46662a70fbccfa6bbf6fbd6419c34a1](https://git.sr.ht/~sschwarzer/ftputil/commit/4ed85584a46662a70fbccfa6bbf6fbd6419c34a1 "Add session adapter for ticket #100 Under Python 2,
ftplib.FTP
(and ...") and modifyFTPHost
to use the adapter in 7518d2b15a09b11864ef7fd62bfc3eea326455ba.Allow
ftputil.test.mock_ftplib.MockSession.transfercmd
to take arest
argument because this argument is forwarded by the adapted session e80ed840fef23b9764cf45258a845eb000576197.Fix: Adapt the session, not the session factory e2a2ee52d1ab8a1d36986f7fada62c4173dc26c8. If the session factory is a function instead of a class, we can't inherit from it.