In case if there is an entry with an empty name or a name consisting of
spaces only in the certain folder, listdir() is unable to correctly
parse the response from FTP server. This leads to ValueError
exception.
Use case of the server response (after 23:35 there are 4 spaces):
drwx------ 2 1021 601 4096 Nov 14 23:35
d-w------- 5 1021 601 4096 Oct 11 2011 test
Any attempt to list these items with listdir() or to remove the holding folder with rmtree() raises the exception:
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
File "C:\Python27\lib\site-packages\ftputil\__init__.py", line 759, in rmtree
names = self.listdir(path)
File "C:\Python27\lib\site-packages\ftputil\__init__.py", line 863, in listdir
return self._stat._listdir(path)
File "C:\Python27\lib\site-packages\ftputil\ftp_stat.py", line 602, in _listdir
return self.__call_with_parser_retry(self._real_listdir, path)
File "C:\Python27\lib\site-packages\ftputil\ftp_stat.py", line 578, in __call_with_parser_retry
result = method(*args, **kwargs)
File "C:\Python27\lib\site-packages\ftputil\ftp_stat.py", line 460, in _real_listdir
for stat_result in self._stat_results_from_dir(path):
File "C:\Python27\lib\site-packages\ftputil\ftp_stat.py", line 436, in _stat_results_from_dir
self._host.time_shift())
File "C:\Python27\lib\site-packages\ftputil\ftp_stat.py", line 297, in parse_line
year_or_time, name = self._split_line(line)
ValueError: need more than 8 values to unpack
SYST:
215 UNIX Type: L8
Looks strange but at my server I was able to create a new folder with an empty name using FTPHost.mkdir(" ").
Regards, Alexander
Hi Alexander, thanks for the report.
That's quite an unusual use case, I suppose, but I agree that ftputil ideally should be able to handle the situation. This reminds me of ticket #60, which was caused by leading whitespace in a file name. Please read the comments there.
The basic problem is that ftputil only gets the directory listing from the server; this listing is the same you get with
ls
in a command line FTP client. As far as I know there's no formal standard for this listing. It just happens to be the same for most FTP servers by convention.An attempt to solve your problem might be to change ftputil's parsing routines to use regular expressions instead of just
line.split
. But if I, for example, assume exactly one space between a timestamp and a file name and a server puts two spaces instead of one between the timestamp and the name, every filename would seem to have a leading space.Interestingly, at least some FTP servers also don't like whitespace in names. When I tried to create a directory consisting of only one space, the FTP server I use for testing changed the space to an underscore.
I closed ticket #60 as WONTFIX for the above reasons. I'll probably do the same with this one unless you have a good idea how leading or trailing whitespace in file names could be handled.
That said, if I close this one as WONTFIX, the problem with leading/trailing whitespace should be mentioned in the documentation and the way of reporting the problem should be improved (e. g. by raising a
ParserError
and hint at the whitespace problem in the exception message).
Thanks for the reply, Stefan.
At least I have one idea regarding this. Since it is almost impossible to make universal parsing of a directory listing (which is more than understandable), what if we add a possibility to skip an item with inappropriate name. I think it is better to list all directory items without a "bad" item than not to list the directory at all. I believe that could be done using exception handling for
ValueError
.I am a Windows user and I utilize Total Commander and WinSCP for working with FTP. Both software list all items except the one with the "bad" name. However, they are unable to remove the directory holding that item, but that's not a big problem IMO.
To be on the safe side we can add a setting (globally or for
listdir
only) of skipping "bad" entries or raising aParserError
.
Hey, that's an excellent idea. Thanks a lot! :-)
I don't know yet if I'll add the flag, but skipping problematic entries instead of aborting the whole listing operation definitely makes sense to me. I wonder why I didn't have this idea myself.
I've committed changeset 11506029e0290b45ad2873d48720da16fcb5619c which makes ftputil raise a
ParserError
instead of aValueError
if the name consists of whitespace only.
Regarding ignoring invalid lines:
The bad news is that I won't implement this for now. The documented behavior of
parse_line
is that it returns aParseResult
instance or raises aParserError
if it can't parse the line.The good news is that there's a relatively simple workaround. You can use the documented parser API to implement a custom parser which ignores invalid lines (although this isn't the intended use for
ignores_line
):# Untested code! import ftputil from ftputil import ftp_error from ftputil import ftp_stat class TolerantUnixParser(ftp_stat.UnixParser): """ This parser ignores invalid lines which would otherwise abort parsing of a directory listing with a `ParserError`. """ def ignores_line(self, line): if super(TolerantUnixParser, self).ignores_line(line): return True try: # Time shift shouldn't be of interest here. self.parse_line(line) except ftp_error.ParserError: return True else: return False ftp_host = ftputil.FTPHost(...) ftp_host.set_parser(TolerantUnixParser())
A disadvantage of the approach is that each valid line is parsed twice. However, this probably won't matter much in comparison to network latency, unless you're on a network with gigabit-per-second bandwidth.
Replying to schwa:
The bad news is that I won't implement this for now. The documented behavior of
parse_line
is that it returns aParseResult
instance or raises aParserError
if it can't parse the line.Actually, I've implemented something in changeset 935ddb1b0d14a9958606660fa0a2158210b1d287 *, but I'm not really happy with it for the reason in the commit message. Do you have other ideas?
I think it's a good idea to ignore parser errors only via an explicitly-set parser. If ignoring invalid lines would be the default, the automatic parser switching from Unix to MS format would no longer work. (ftputil tries the Unix parser first, and if this raises a
ParserError
, tries the parser for the Microsoft format. Setting a parser explicitly disables this automatic switching.)* Please ignore the
TolerantUnixParser
inftp_stat.py
. I only wrote this there to see how it would look like before I pasted it into the issue tracker in the previous comment. The class wasn't intended to go into the actual ftputil code.
I reverted the mentioned change 935ddb1b0d14a9958606660fa0a2158210b1d287 because I think it's rather awkward (see the commit message). If a user really wants to ignore whitespace-only names, he/she can still use the custom parser workaround from comment 6.
I set this ticket to "fixed" because I changed the
ValueError
to the documentedParserError
(see comment 5).