Hello,
I am on ftputil 5.1.0, Python 3.11.8 (linux)
Today is a special date, Feb 29, 2024. I am at timezone UTC+8.
I am getting an error in my code
object_name_list = ftp_host.listdir(folder_path)
Error message is:
> Traceback (most recent call last):
> File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 210, in _datetime
> return datetime.datetime(
> ^^^^^^^^^^^^^^^^^^
> ValueError: day is out of range for month
After digging into it I can tell that the error came from Line 308 to Line 311:
> server_utc_datetime = self._datetime(
> server_year, month, day, hour, minute, 0
> ) - datetime.timedelta(seconds=time_shift)
> st_mtime = server_utc_datetime.timestamp()
When it tried to subtract one year from the current date, an exception happened as Feb 29, 2023 is not a valid date.
I believe the error will go away tomorrow, but it would be great if it could be fixed before the next Feb 29 :)
Thank you, Stacy
Hi ~stacylo, thanks for the report!
Do you have the line from the directory listing where the error occured? You can add a
print("===", line)
at line 472. You can obscure the file name if it's necessary for privacy reasons.
Also, even if it probably doesn't matter, at which time (roughly) did you run the
listdir
call?
Also, can you please give me the values of the arguments of parse_unix_time for which the problem happens?
Is the server in a different timezone from the client (where you run ftputil)? If yes, did you set the time zone (with
FTPHost.synchronize_times
orFTPHost.set_time_shift
) before callingFTPHost.listdir
?
Hi Stefan,
I ran it sometime in the morning between 10:30 and 11:00, Feb 29, 2024. And yes, the time should not matter. I believe the error is going to occur for the whole day Feb 29, 2024.
Below is the traceback for the error so you can see which line. (Sorry about the format. Do you know how I can keep the correct line breaks in the markdown?)
Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 210, in _datetime return datetime.datetime( ^^^^^^^^^^^^^^^^^^ ValueError: day is out of range for month During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 801, in __call_with_parser_retry result = method(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 768, in _real_stat lstat_result = self._real_lstat(path, _exception_for_missing_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 729, in _real_lstat for stat_result in self._stat_results_from_dir(dirname): File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 636, in _stat_results_from_dir stat_result = self._parser.parse_line(line, self._host.time_shift()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 472, in parse_line st_mtime, st_mtime_precision = self.parse_unix_time( ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 308, in parse_unix_time server_utc_datetime = self._datetime( ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 218, in _datetime raise ftputil.error.ParserError( ftputil.error.ParserError: invalid datetime '2023-02-29 17:02:00' Debugging info: ftputil 5.1.0, Python 3.11.8 (linux) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 549, in parse_line st_size = int(dir_or_size) ^^^^^^^^^^^^^^^^ ValueError: invalid literal for int() with base 10: 'ftp' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/project/src/main.py", line 49, in <module> main() File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/project/src/main.py", line 36, in main is_success = import_orders(sap_environment=sap_environment) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/project/src/functions/import_orders.py", line 429, in import_orders is_success_1 = process_folder(sap_grpc_client=sap_grpc_client, ftp_host=ftp_host, folder_dict=folder_dict) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/project/src/functions/import_orders.py", line 372, in process_folder object_name_list = ftp_host.listdir(folder_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/host.py", line 988, in listdir items = self._stat._listdir(path) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 825, in _listdir return self.__call_with_parser_retry(self._real_listdir, path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 801, in __call_with_parser_retry result = method(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 680, in _real_listdir if not self._path.isdir(path): ^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/path.py", line 154, in isdir return self._is_file_system_entity(path, "dir") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/path.py", line 132, in _is_file_system_entity stat_result = self._host.stat(path, _exception_for_missing_path=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/host.py", line 1024, in stat return self._stat._stat(path, _exception_for_missing_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 845, in _stat return self.__call_with_parser_retry( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 801, in __call_with_parser_retry result = method(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 768, in _real_stat lstat_result = self._real_lstat(path, _exception_for_missing_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 719, in _real_lstat if not self._path.isdir(dirname) and not _exception_for_missing_path: ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/path.py", line 154, in isdir return self._is_file_system_entity(path, "dir") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/path.py", line 132, in _is_file_system_entity stat_result = self._host.stat(path, _exception_for_missing_path=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/host.py", line 1024, in stat return self._stat._stat(path, _exception_for_missing_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 845, in _stat return self.__call_with_parser_retry( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 811, in __call_with_parser_retry return method(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 768, in _real_stat lstat_result = self._real_lstat(path, _exception_for_missing_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 729, in _real_lstat for stat_result in self._stat_results_from_dir(dirname): File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 636, in _stat_results_from_dir stat_result = self._parser.parse_line(line, self._host.time_shift()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 551, in parse_line raise ftputil.error.ParserError("invalid size {}".format(dir_or_size)) ftputil.error.ParserError: invalid size ftp Debugging info: ftputil 5.1.0, Python 3.11.8 (linux)
For the time zone, my local machine is UTC+8. The FTP server should be UTC+8 as well, but I am not sure.
The values of the arguments of parse_unix_time are
- month_abbreviation='Feb'
- day='29'
- year_or_time='17:12'
- time_shift=0.0
- with_precision=True
Do you know how I can keep the correct line breaks in the markdown?
You can use literal formatting of a code block by using a line with three backticks before and after the code block. I added the backticks in your description and comment.
I think you can also use all the Markdown formatting described in the linked Markdown Guide in Sourcehut ticket descriptions and comments.
This looks weird:
Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 549, in parse_line st_size = int(dir_or_size) ^^^^^^^^^^^^^^^^ ValueError: invalid literal for int() with base 10: 'ftp'
Can you print the whole problematic line before it gets parsed (or tried to be parsed ;-) )?
This looks weird: ...
Also, this exception happens in the MS parser, but the invalid datetime is encountered in the unix parser. I haven't checked all details yet, but it could be during a parser retry, where ftputil tries the unix parser first and when it fails, falls back to the MS parser.
Anyway, I'll check all the stacktraces more closely.
Still, the whole failing line could be really helpful.
Can you print the whole problematic line before it gets parsed (or tried to be parsed ;-) )?
Yes, it is:
drwxr-xr-x 1 ftp ftp 0 Feb 29 17:31 .
Yes, it is: drwxr-xr-x 1 ftp ftp 0 Feb 29 17:31 .
Thanks! :-)
The values of the arguments of parse_unix_time are
month_abbreviation='Feb' day='29' year_or_time='17:12' time_shift=0.0 with_precision=True
If you have the time shift set to 0.0 and call
listdir
at around 11:00 in your timezone, ftputil inspects the datetime "Feb 29 17:31", assumes it can't be in the future and therefore assumes it belongs to the last year. (That's the common interpretation for dates/datetimes without a year in FTP directory listings.) Therefore, ftputil tries to subtract one year to get the actual date, which fails since 2023-02-29 is invalid.So either the FTP server reports a wrong time for the listed directory/file, the time shift is set incorrectly - or something else that I haven't thought of yet. ;-) But I think the most likely reason is that the server is in a different time zone and you have to set the time shift (server time - UTC time) on your end.
If you have write permissions in some directory on the server, you can use
FTPHost.synchronize_times
to detect and set the time shift automatically, otherwise you'd have to useFTPHost.set_time_shift
. If you can't usesynchronize_times
and don't know the time zone of the server, you could try usingftplib.FTP.mlsd
for the file. However, I don't know how the MLSD command handles time zones / offsets, i.e. if the time is UTC or local (only the former would be useful for you). Moreover, the server might not support the MLSD command.
If you can't find out the actual time shift to use and you're only interested in the names of the directories/files and never in the actual datetimes, it might be enough to set a high time shift, e.g.
24*60*60
. In that case, the time from the server should be considered to be in the past and the parsing should succeed. I know it's a hack, but I think that's the best that can be done.
Yeah okay, I think the problem has been resolved.
ftp_host.synchronize_times()
does not work for me. It threw the same error.So I did
ftp_host.set_time_shift(time_shift=28800)
beforeftp_host.listdir(folder_path)
, and it worked.It is also a good suggestion to set a high shift because all I need is the list of file names.
Thank you!!!
You're welcome! :-)
Hello Stefan,
our Python projects which uses
ftputil
were also affected on Feb 29. It was kind of stressful day. We could mitigate the issue by callingFTPHost.synchronize_time()
. I didn't even know this was a thing 😅.Assuming that I correctly unterstand the logic behind subtracting 1 from current year when the date looks like it's in the future (stat.py:299), I think there is still a bug in the code:
If the date looks like it is in the future (either due to time-sync-issues or whatever reason), we have to assume that the date belongs to the previous year. Fine. Since this logic is supposed to work for any date, it should also work for Feb 29. As seen, it is not possible to just subtract one year on Feb 29 🤣
When the date looks like it is from the future, and this date is Feb 29, should not subtract 1 from current year but find and use the previous leap year instead. This formula always finds the previous leap year for a given year:
(year - 1) // 4 * 4
. You can run the code here to verify that the formula works correctly: https://gist.github.com/lollo0296/be6b950aabdc899827647d7987304c7dI wanted to submit a pull request, but I have no clue how to do it on sourcehat. I found a mirror repo on GitHub, but this seems to be very old code. For this reason, I will submit my proposal here:
#Current code (stat.py:299):
if self._datetime( server_year, month, day, hour, minute, 0 ) > server_now.replace(second=0) + datetime.timedelta(seconds=120): server_year -= 1#My change proposal:
if self._datetime( server_year, month, day, hour, minute, 0 ) > server_now.replace(second=0) + datetime.timedelta(seconds=120): if month == 2 and day == 29: server_year = (server_year - 1) // 4 * 4 else: server_year -= 1Cheers 🍻
Hi ~lorenzom,
Many thanks for your feedback! :-)
If the date looks like it is in the future (either due to time-sync-issues or whatever reason), we have to assume that the date belongs to the previous year. Fine. Since this logic is supposed to work for any date, it should also work for Feb 29. As seen, it is not possible to just subtract one year on Feb 29 🤣
When the date looks like it is from the future, and this date is Feb 29, should not subtract 1 from current year but find and use the previous leap year instead. This formula always finds the previous leap year for a given year: (year - 1) // 4 * 4.
I think that doesn't really help.
In my opinion, the goal shouldn't be to use the "same" date (same day and month), but to find a "correct" date (or "as correct as possible"). Subtracting four years would obviously give you a date four years in the past whereas in all other cases we go back one year. A slightly better choice, I think, would be to choose a date close to "February 29th, but one year ago". For example, you could pick March 1st of the previous year. This is similar to what some people do whose birthday is on February 29th: in non-leap years, they celebrate their birthday on March 1st.
That said, I think neither of these approaches is ok. Yes, they'd avoid the exception, but we'd use a date that we know is incorrect. On the other hand, the current ftputil code works if the time shift is set correctly.
Now, I suspect that relatively few people read the documentation attentively enough and aren't even aware that they're supposed to set the time shift. That's not really a complaint. Everyone, including me, cuts corners in some situations. Nobody's perfect.
Let's look at the time shift-related use cases which I think are relevant for our discussion:
- If you want to use methods like
upload_if_newer
anddownload_if_newer
, you must set a correct time shift. Otherwise these methods won't work correctly. The only exception is if your current time zone is +00:00, in which case the default time shift of 0.0 happens to be correct.- You only want to get the directory or file names in a directory, or more generally, you're not interested in the last modification datetime at all. You also don't want to use
upload_if_newer
ordownload_if_newer
(see the previous case). You also never rely on the datetime returned byFTPHost.stat
or similar methods.For the second use case, we only want to avoid that we get an exception when parsing a directory. In this case, either your approach or mine (picking March 1st) would work.
However, this ignores the first use case. Here, we'd avoid the exception, but we'd use a timestamp we know to be wrong. I can't help it, but in my opinion, that's worse than assuming the time shift is set correctly and use wrong timestamps because of that (although in both cases we use a wrong timestamp, no more, no less).
After thinking about the problem for a while (which is the reason I didn't reply earlier), I think the problem is that since the time shift is set to 0.0 by default, ftputil can't distinguish between these cases:
- The time shift actually is 0.0, left at the default.
- The time shift actually is 0.0, and was explicitly set to this value.
- The time shift was never set, but actually is not 0.0.
At the moment, I think the cleanest way out of this would be:
- When creating an
FTPHost
object, set the time shift to a special value likeUNSET_TIME_SHIFT
.- If this default is present, parsing a directory line will set the timestamp to
None
since no timestamp can be determined. This would support the use case where you're not interested in the timestamps (neither implicitly nor explicitly).- If this default is present and you use
upload_if_newer
ordownload_if_newer
, ftputil will raise an exception, complaining about the unknown time shift.- If the time shift is determined with
synchronize_times
or set explicitly withset_time_shift
, ftputil will behave as it does currently: You get timestamps corrected by the time shift and parsing directory entries always works (as long as the time shift is set correctly and the server doesn't actually use timestamps in the future ;-) ).Using this approach, it would mean that accidental uses of the default time shift 0.0 result in an exception instead of bogus results and behavior. However, in cases where up to now the time shift was intentionally left at or was set to 0.0, this is a breaking change.
Unless we come up with a better idea, I suggest the following:
- I'll (hopefully soon, but it may take a while) release ftputil 5.2, which, if the time shift is used and is 0.0, will print a deprecation warning, but won't otherwise change the behavior.
- Later, I'll release ftputil 6.0 (new major version number because I follow SemVer and this is a breaking change) with the
UNSET_TIME_SHIFT
approach described above.As I said, this would be a breaking change for users who deliberately used a time shift of 0.0. (Although I think, using
set_time_shift
explicitly would be cleaner.)(There are also edge cases where you actually can ignore wrong timestamps. If you have a job on the FTP server that adds new files every week and you have a script that runs (for example) three days after the files were added, time zone differences don't matter. For example,
download_if_newer
will detect that the remote and local timestamps for the new files are more than a day apart (regardless of the time shift). I only mention this edge case here for completeness. I don't think it should change anything in the chosen release strategy.)I wanted to submit a pull request, but I have no clue how to do it on sourcehat.
The approach recommended by Sourcehut is described here, but personally I don't mind getting a text diff, assuming the changes aren't too large. And if they're large, it would anyway make a lot of sense to contact me first before working on a PR. :-)
I found a mirror repo on GitHub, but this seems to be very old code.
Sorry about that. The Github repo is supposed to be an up-to-date mirror. I thought I had my Git remotes set up so that
git push
pushes both to Sourcehut and Github, but I confused this with another repo where I had this set up like that. I fixed the remotes now and pushed the current state of my repository so Sourcehut and Github are in sync.Regarding PRs on Github, I'd deactivate them if I could, but it seems that's not possible. Unfortunately, you can deactivate issues and wikis, but not PRs.
By the way, actually the time shift doesn't specify the time zone offset at the place of the server, but the time zone offset of the timestamps the server outputs in directory listings. In other words, if the server outputs UTC, the time shift must be set to 0.0.
This also means that actually wanting a time shift of 0.0 isn't as rare as I thought because (I hope) nowadays many FTP servers are configured to show UTC timestamps.
That said, this doesn't change anything for the above suggestion for cleaning up the API.