A user wrote me that when uploading files, they get wrong names in the remote file system even if they use encoding="UTF-8"
in the session factory.
The problem is that the server sees just bytes and doesn't know the encoding of the paths the client sends.
The user solved the problem on their own by using session.sendcmd("OPTS UTF8 ON")
in their custom FTP session class.
I can't do this by default because some FTP servers might not understand that command and might raise an exception (see also #110 for a similar but more subtle problem).
I should add this tip to the ftputil documentation, maybe in the FAQ part.
Other approaches:
- Add an
FTPHost
flag attribute, similar touse_list_a_option
, which would cause ftputil to sendOPTS UTF8 ON
to the server.- Add a method
FTPHost.session_command
, which you could use asFTPHost.session_command("OPTS UTF8 ON")
. You'd need to call this method right after creating theFTPHost
instance. The downside (or upside, depending on your point of view) of this approach is that it adds a low-level API to the otherwise high-level ftputil user-facing APIs.Regarding the second list item, theoretically it would be possible to go further and expose the session attribute of
FTPHost
asFTPHost.session
. However, I wonder if this will unnecessarily encourage some users to abuse the API to do arbitrary things with the enclosed session, possibly interfering with other ftputil APIs.
Related documentation:
- RFC 2640, Internationalization of the File Transfer Protocol
- Draft on UTF-8 option, UTF-8 Option for FTP
Add an FTPHost flag attribute, similar to use_list_a_option, which would cause ftputil to send OPTS UTF8 ON to the server.
Probably this isn't a good idea. ;-) The
use_list_a_option
flag is used when ftputil gets a directory. On the other hand, if sendingOPTS UTF8 ON
is desired, this must happen before using the first command that uses paths. Therefore, the ftputil API should be a method call, not an attribute assignment. (Technically, we could trigger the command sending with the assignment to an attribute by using__setattr__
, but that's unneccesary magic and confusing.)
Add a method FTPHost.session_command, which you could use as FTPHost.session_command("OPTS UTF8 ON").
I don't like that this kind of decouples the setting of the UTF-8 encoding in the session factory from calling the
FTPHost
method, despite that sendingOPTS UTF8 ON
only makes sense if the session uses UTF-8 path encoding.So another possibility would be to add a flag
send_opts_utf8_on
tosession_factory
, which could be used together withencoding="UTF-8"
. If the flag is set, the factory would sendOPTS UTF8 ON
. Unfortunately, for backward compatibility the flag needs to beFalse
. ftputil should raise an error ifsend_opts_utf8_on=True
is used with an encoding different fromUTF8
,UTF-8
,utf8
orutf-8
.
Like
encoding
, I plan to makesend_opts_utf8_on
a ternary setting.The new behavior of
session.session_factory
, depending on the arguments, is going to be:
encoding send_opts_utf8_on behavior None None use "native" encoding of Python version; don't send OPTS
commandNone False use "native" encoding of Python version; don't send OPTS
commandNone True raise exception for invalid argument combination (1) non-UTF8 None don't send OPTS
commandnon-UTF8 False don't send OPTS
commandnon-UTF8 True raise exception for invalid argument combination UTF8 None send OPTS
command wrapped intry ... except (PermanentError, TemporaryError): pass
; i.e. ignore errors fromOPTS
command (2)UTF8 False don't send OPTS
commandUTF8 True send OPTS
command without wrapping it intry ... except
(3)(1) The encoding
None
is passed toftplib
and we don't check further what the actual resulting encoding is. If you're on Python 3.9+ and want UTF-8 encoding combined withOPTS UTF8 ON
, pass the encoding explicitly.(2) I wasn't sure if this could still lead to problems, depending on the used FTP server. However, I think the chance of problems is rather small, likely smaller than getting invalid paths because
OPTS UTF8 ON
isn't sent.(3) Assume that the user knows what they're doing.
After coming accross RFC 2640, I learned that we can query whether the server supports the
OPTS UTF8 ON
command.Therefore, remove the
send_opts_utf8_on
argument ofsession_factory
and use the result from theFEAT
command to control whetherOPTS UTF8 ON
should be sent.There's still a (probably very small) chance that the server pretends support for UTF-8 in the
FEAT
output, but actually returns an error forOPTS UTF8 ON
. However, instead of adding yet another ternary control argument, risk that we get an unwanted exception from the server. If that happens, client code can still use a custom session factory not generated withsession_factory
.I also considered wrapping
sendcmd("OPTS UTF8 ON")
intry: ... except (PermanentError, TemporaryError): pass
, but decided that we should trust the server if it tells us in theFEAT
output that theUTF8
feature is supported.