Wrong handling of non-ASCII characters in paths under Python 2

For ticket #96, I was looking into the exact behavior of ftputil when encoding and decoding file system paths for FTPHost.listdir.

It turned out that under Python 2 ftputil doesn't work correctly with paths that contain non-ASCII characters. The intended behavior is:

Path argument is a byte string:

  • Characters are sent as they are, even if they are non-ASCII characters (i. e. character codes greater than 127).

  • Returned lines are byte strings (latin1-encoded, i. e. the data sent by the server) if the listdir argument was a byte string.

Path argument is a unicode string:

  • Unicode strings are assumed to be decoded from latin1. For example, to get the contents of the directory b'\xc3\xa4bc' (corresponding to "äbc"), the unicode string must be b'\xc3\xa4bc'.decode("latin1").

  • Returned lines are unicode strings (decoded assuming latin1 encoding).

This behavior is indeed seen when ftputil runs under Python 3. (By the way, the handling of unicode strings is strange, but it's used to be compatible with Python 3's ftplib.FTP implementation. If ftputil.FTPHost would behave differently, it couldn't work with non-ASCII paths that were created with ftplib.FTP under Python 3.)

Under Python 2, pure ASCII strings work as above regardless of being byte strings or unicode strings. However, when the strings contain characters with codes greater than 127, FTPHost.listdir raises a UnicodeEncodeError, again both for byte strings and unicode strings.

schwa (unverified)
Assigned to
8 years ago
8 years ago
bug library

schwa (unverified) 8 years ago · edit

There's a unit test test.test_host.TestAcceptEitherUnicodeOrBytes.test_listdir:

    def test_listdir(self):
        """Test whether `listdir` accepts either unicode or bytes."""
        host = self.host
        as_bytes = ftputil.tool.as_bytes
        # Unicode
        items = host.listdir("ä")
        self.assertEqual(items, ["ö", "o"])
        #  Need explicit type check for Python 2
        for item in items:
            self.assertTrue(isinstance(item, ftputil.compat.unicode_type))
        # Bytes
        items = host.listdir(as_bytes("ä"))
        self.assertEqual(items, [as_bytes("ö"), as_bytes("o")])
        #  Need explicit type check for Python 2
        for item in items:
            self.assertTrue(isinstance(item, ftputil.compat.bytes_type))

As far as I can tell this should catch the problematic behavior in the ticket description. I need to find out why the test doesn't uncover the problem.

schwa (unverified) 8 years ago · edit

The problem with the unit test is that it uses a mock session, that is, it doesn't use ftplib.FTP to access a real FTP server. Instead, the mock dir method returns lines from a multi-line string. Here's the mock implementation of ftplib.FTP.dir:

    def dir(self, *args):
        Provide a callback function for processing each line of a
        directory listing. Return nothing.
        # The callback comes last in `ftplib.FTP.dir`.
        if isinstance(args[-1], collections.Callable):
            # Get `args[-1]` _before_ removing it in the line after.
            callback = args[-1]
            args = args[:-1]
            callback = None
        # Everything before the path argument are options.
        path = args[-1]
        if DEBUG:
            print("dir: {0}".format(path))
        path = self._transform_path(path)
        if path not in self.dir_contents:
            raise ftplib.error_perm
        dir_lines = self.dir_contents[path].split("\n")
        for line in dir_lines:
            if callback is None:

Only tests in test/test_real_ftp.py actually talk to an FTP server. I'm going to add a test there.

schwa (unverified) 8 years ago · edit

The problem should be fixed now.

Register here or Log in to comment, or comment via email.