~sschwarzer/ftputil#157: 
Tip: Send `OPTS UTF8 ON` to server

A user wrote me that when uploading files, they get wrong names in the remote file system even if they use encoding="UTF-8" in the session factory.

The problem is that the server sees just bytes and doesn't know the encoding of the paths the client sends.

The user solved the problem on their own by using session.sendcmd("OPTS UTF8 ON") in their custom FTP session class.

I can't do this by default because some FTP servers might not understand that command and might raise an exception (see also #110 for a similar but more subtle problem).

Status
RESOLVED IMPLEMENTED
Submitter
~sschwarzer
Assigned to
No-one
Submitted
10 months ago
Updated
8 months ago
Labels
bug enhancement library

~sschwarzer 10 months ago

I should add this tip to the ftputil documentation, maybe in the FAQ part.

~sschwarzer 10 months ago*

Other approaches:

  • Add an FTPHost flag attribute, similar to use_list_a_option, which would cause ftputil to send OPTS UTF8 ON to the server.
  • Add a method FTPHost.session_command, which you could use as FTPHost.session_command("OPTS UTF8 ON"). You'd need to call this method right after creating the FTPHost instance. The downside (or upside, depending on your point of view) of this approach is that it adds a low-level API to the otherwise high-level ftputil user-facing APIs.

Regarding the second list item, theoretically it would be possible to go further and expose the session attribute of FTPHost as FTPHost.session. However, I wonder if this will unnecessarily encourage some users to abuse the API to do arbitrary things with the enclosed session, possibly interfering with other ftputil APIs.

~sschwarzer 10 months ago*

Related documentation:

~sschwarzer 10 months ago

Add an FTPHost flag attribute, similar to use_list_a_option, which would cause ftputil to send OPTS UTF8 ON to the server.

Probably this isn't a good idea. ;-) The use_list_a_option flag is used when ftputil gets a directory. On the other hand, if sending OPTS UTF8 ON is desired, this must happen before using the first command that uses paths. Therefore, the ftputil API should be a method call, not an attribute assignment. (Technically, we could trigger the command sending with the assignment to an attribute by using __setattr__, but that's unneccesary magic and confusing.)

~sschwarzer 10 months ago

Add a method FTPHost.session_command, which you could use as FTPHost.session_command("OPTS UTF8 ON").

I don't like that this kind of decouples the setting of the UTF-8 encoding in the session factory from calling the FTPHost method, despite that sending OPTS UTF8 ON only makes sense if the session uses UTF-8 path encoding.

So another possibility would be to add a flag send_opts_utf8_on to session_factory, which could be used together with encoding="UTF-8". If the flag is set, the factory would send OPTS UTF8 ON. Unfortunately, for backward compatibility the flag needs to be False. ftputil should raise an error if send_opts_utf8_on=True is used with an encoding different from UTF8, UTF-8, utf8 or utf-8.

~sschwarzer 10 months ago*

Like encoding, I plan to make send_opts_utf8_on a ternary setting.

The new behavior of session.session_factory, depending on the arguments, is going to be:

encoding send_opts_utf8_on behavior
None None use "native" encoding of Python version; don't send OPTS command
None False use "native" encoding of Python version; don't send OPTS command
None True raise exception for invalid argument combination (1)
non-UTF8 None don't send OPTS command
non-UTF8 False don't send OPTS command
non-UTF8 True raise exception for invalid argument combination
UTF8 None send OPTS command wrapped in try ... except (PermanentError, TemporaryError): pass; i.e. ignore errors from OPTS command (2)
UTF8 False don't send OPTS command
UTF8 True send OPTS command without wrapping it in try ... except (3)

(1) The encoding None is passed to ftplib and we don't check further what the actual resulting encoding is. If you're on Python 3.9+ and want UTF-8 encoding combined with OPTS UTF8 ON, pass the encoding explicitly.

(2) I wasn't sure if this could still lead to problems, depending on the used FTP server. However, I think the chance of problems is rather small, likely smaller than getting invalid paths because OPTS UTF8 ON isn't sent.

(3) Assume that the user knows what they're doing.

~sschwarzer 8 months ago

After coming accross RFC 2640, I learned that we can query whether the server supports the OPTS UTF8 ON command.

Therefore, remove the send_opts_utf8_on argument of session_factory and use the result from the FEAT command to control whether OPTS UTF8 ON should be sent.

There's still a (probably very small) chance that the server pretends support for UTF-8 in the FEAT output, but actually returns an error for OPTS UTF8 ON. However, instead of adding yet another ternary control argument, risk that we get an unwanted exception from the server. If that happens, client code can still use a custom session factory not generated with session_factory.

I also considered wrapping sendcmd("OPTS UTF8 ON") in try: ... except (PermanentError, TemporaryError): pass, but decided that we should trust the server if it tells us in the FEAT output that the UTF8 feature is supported.

~sschwarzer REPORTED IMPLEMENTED 8 months ago

Register here or Log in to comment, or comment via email.