~sschwarzer/ftputil#152: 
`ParserError` for newlines in directory or file names

As Simon Cox describes in these two list posts, a newline in a directory or file name leads to a ParserError.

The reason is that ftputil always gets the contents of a whole directory, which is a (possibly multiline) string, for example

-rw-r--r--. 1 schwa schwa   4036 Aug 22 11:03 file1
-rw-r--r--. 1 schwa schwa   4036 Aug 22 11:03 file
with linebreak
-rw-r--r--. 1 schwa schwa   4036 Aug 22 11:03 file2

Since the parser's parse_line method is called on each line, the with linebreak line causes the ParserError.

Status
REPORTED
Submitter
~sschwarzer
Assigned to
No-one
Submitted
2 years ago
Updated
2 years ago
Labels
No labels applied.

~sschwarzer 2 years ago*

I don't think it would be a good idea to ignore the "redundant" lines. There are several reasons:

  • The initial line for the file will be incomplete, so the presumed directory or file name usually would refer to a non-existent directory or file.
  • Ignoring "unparsable" lines might come from invalid data in the directory listing or even something like "total 11" lines. (This specific pattern is ignored by the UnixParser.)
  • Automatic parser switching relies on a ParserError for a line that the initially set parser can't parse.

Because of these reasons and because newlines in directory and file names should be very rare, I don't intend to add a heuristic to the parser to join lines.

That said, there's a workaround to at least ignore the subsequent lines that come from multiline directory or file names. It's possible to write custom parsers and set them with set_parser. A custom parser wouldn't be able to join lines, but it would be possible to ignore lines that would usually cause a ParserError by overriding the ignores_line method.

Therefore, the (untested) code would look something like this:

# Parser base class for the format the _server_ uses.
# Either `ftputil.stat.UnixParser` or `ftputil.stat.MSParser`.
parser_base_class = ...
class MyParser(parser_base_class):
    def ignores_line(self, line):
        super_ignores_line = super().ignores_line(line)
        if super_ignores_line:
            return True
        try:
            _ = self.parse_line(line)
        except ftputil.error.ParserError:
            return True
        return False


with ftputil.FTPHost(...) as ftp_host:
    ftp_host.set_parser(MyParser())
    ...
Register here or Log in to comment, or comment via email.