刘昶 wrote to the ftputil mailing list with the following observation:
test Enviroment： Server: File Zilla Server 0.9.50 Client OS: Win7 import ftputil # Download some files from the login directory. with ftputil.FTPHost("localhost", user='honglei',passwd='111111' ) as ftp_host: names = ftp_host.listdir(ftp_host.curdir) I find that: name[-1] == u'\xe8\xbf\x99\xe6\x98\xaf\xe4\xb8\xad\xe6\x96\x87.txt', it is a 'utf-8' encoded filename, rather than an unicode string.
Thanks a lot for bringing this up.
Technically, this is a unicode string, but you're right in that you can see a UTF-8 encoding here.
When you use
listdirin ftputil, it uses the standard library's ftplib to retrieve a directory listing. On Python 3, ftplib returns unicode strings. However, since the socket ultimately gets only bytes and ftplib doesn't know the encoding, it arbitrarily assumes latin1 encoding. Since this is an 8-bit encoding, there can't be decoding exceptions.
ftputil processes the strings returned by ftplib as they come, so ftputil in turn gives you those latin1-encoded unicode strings.
Since ftputil uses a unified API for Python 2 and 3, it applies the same unicode handling when run on Python 2.
If you know that the strings use latin1 encoding and if you know that the original encoding coming from the FTP server was UTF-8, you can calculate a unicode string in the correct encoding:
>>> s = u'\xe8\xbf\x99\xe6\x98\xaf\xe4\xb8\xad\xe6\x96\x87.txt' >>> s.encode("latin1") b'\xe8\xbf\x99\xe6\x98\xaf\xe4\xb8\xad\xe6\x96\x87.txt' >>> s.encode("latin1").decode("utf8") '这是中文.txt'
I guess this is the name you expected.
In the general case, i. e. if you don't know the encoding, you can just calculate the byte string by encoding with latin1 as the encoding.
I plan to extend the ftputil documentation to clarify what's going on here.
I fixed and expanded the documentation section "Directory and file names" in 21d9df0d26acf8a35c8950e86de37a438e1ae25c.