A user wrote me that when uploading files, they get wrong names in the remote file system even if they use
encoding="UTF-8" in the session factory.
The problem is that the server sees just bytes and doesn't know the encoding of the paths the client sends.
The user solved the problem on their own by using
session.sendcmd("OPTS UTF8 ON") in their custom FTP session class.
I can't do this by default because some FTP servers might not understand that command and might raise an exception (see also #110 for a similar but more subtle problem).
- Add an
FTPHostflag attribute, similar to
use_list_a_option, which would cause ftputil to send
OPTS UTF8 ONto the server.
- Add a method
FTPHost.session_command, which you could use as
FTPHost.session_command("OPTS UTF8 ON"). You'd need to call this method right after creating the
FTPHostinstance. The downside (or upside, depending on your point of view) of this approach is that it adds a low-level API to the otherwise high-level ftputil user-facing APIs.
Regarding the second list item, theoretically it would be possible to go further and expose the session attribute of
FTPHost.session. However, I wonder if this will unnecessarily encourage some users to abuse the API to do arbitrary things with the enclosed session, possibly interfering with other ftputil APIs.
Add an FTPHost flag attribute, similar to use_list_a_option, which would cause ftputil to send OPTS UTF8 ON to the server.
Probably this isn't a good idea. ;-) The
use_list_a_optionflag is used when ftputil gets a directory. On the other hand, if sending
OPTS UTF8 ONis desired, this must happen before using the first command that uses paths. Therefore, the ftputil API should be a method call, not an attribute assignment. (Technically, we could trigger the command sending with the assignment to an attribute by using
__setattr__, but that's unneccesary magic and confusing.)
Add a method FTPHost.session_command, which you could use as FTPHost.session_command("OPTS UTF8 ON").
I don't like that this kind of decouples the setting of the UTF-8 encoding in the session factory from calling the
FTPHostmethod, despite that sending
OPTS UTF8 ONonly makes sense if the session uses UTF-8 path encoding.
So another possibility would be to add a flag
session_factory, which could be used together with
encoding="UTF-8". If the flag is set, the factory would send
OPTS UTF8 ON. Unfortunately, for backward compatibility the flag needs to be
False. ftputil should raise an error if
send_opts_utf8_on=Trueis used with an encoding different from
encoding, I plan to make
send_opts_utf8_ona ternary setting.
The new behavior of
session.session_factory, depending on the arguments, is going to be:
encoding send_opts_utf8_on behavior None None use "native" encoding of Python version; don't send
None False use "native" encoding of Python version; don't send
None True raise exception for invalid argument combination (1) non-UTF8 None don't send
non-UTF8 False don't send
non-UTF8 True raise exception for invalid argument combination UTF8 None send
OPTScommand wrapped in
try ... except (PermanentError, TemporaryError): pass; i.e. ignore errors from
UTF8 False don't send
UTF8 True send
OPTScommand without wrapping it in
try ... except(3)
(1) The encoding
Noneis passed to
ftpliband we don't check further what the actual resulting encoding is. If you're on Python 3.9+ and want UTF-8 encoding combined with
OPTS UTF8 ON, pass the encoding explicitly.
(2) I wasn't sure if this could still lead to problems, depending on the used FTP server. However, I think the chance of problems is rather small, likely smaller than getting invalid paths because
OPTS UTF8 ONisn't sent.
(3) Assume that the user knows what they're doing.
After coming accross RFC 2640, I learned that we can query whether the server supports the
OPTS UTF8 ONcommand.
Therefore, remove the
session_factoryand use the result from the
FEATcommand to control whether
OPTS UTF8 ONshould be sent.
There's still a (probably very small) chance that the server pretends support for UTF-8 in the
FEAToutput, but actually returns an error for
OPTS UTF8 ON. However, instead of adding yet another ternary control argument, risk that we get an unwanted exception from the server. If that happens, client code can still use a custom session factory not generated with
I also considered wrapping
sendcmd("OPTS UTF8 ON")in
try: ... except (PermanentError, TemporaryError): pass, but decided that we should trust the server if it tells us in the
FEAToutput that the
UTF8feature is supported.