~sschwarzer/ftputil#160: 
Feb 29 Error -- ValueError: day is out of range for month

Hello,

I am on ftputil 5.1.0, Python 3.11.8 (linux)

Today is a special date, Feb 29, 2024. I am at timezone UTC+8.

I am getting an error in my code

object_name_list = ftp_host.listdir(folder_path)

Error message is:

> Traceback (most recent call last):
>  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 210, in _datetime
>    return datetime.datetime(
>           ^^^^^^^^^^^^^^^^^^
> ValueError: day is out of range for month

After digging into it I can tell that the error came from Line 308 to Line 311:

>        server_utc_datetime = self._datetime(
>            server_year, month, day, hour, minute, 0
>        ) - datetime.timedelta(seconds=time_shift)
>        st_mtime = server_utc_datetime.timestamp()

When it tried to subtract one year from the current date, an exception happened as Feb 29, 2023 is not a valid date.

I believe the error will go away tomorrow, but it would be great if it could be fixed before the next Feb 29 :)

Thank you, Stacy

Status
REPORTED
Submitter
~stacylo
Assigned to
No-one
Submitted
11 months ago
Updated
10 months ago
Labels
library

~sschwarzer 11 months ago

Hi ~stacylo, thanks for the report!

Do you have the line from the directory listing where the error occured? You can add a

print("===", line)

at line 472. You can obscure the file name if it's necessary for privacy reasons.

Also, even if it probably doesn't matter, at which time (roughly) did you run the listdir call?

~sschwarzer 11 months ago

Also, can you please give me the values of the arguments of parse_unix_time for which the problem happens?

~sschwarzer 11 months ago

Is the server in a different timezone from the client (where you run ftputil)? If yes, did you set the time zone (with FTPHost.synchronize_times or FTPHost.set_time_shift) before calling FTPHost.listdir?

~stacylo (edited) 11 months ago*

Hi Stefan,

I ran it sometime in the morning between 10:30 and 11:00, Feb 29, 2024. And yes, the time should not matter. I believe the error is going to occur for the whole day Feb 29, 2024.

Below is the traceback for the error so you can see which line. (Sorry about the format. Do you know how I can keep the correct line breaks in the markdown?)


Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 210, in _datetime
    return datetime.datetime(
           ^^^^^^^^^^^^^^^^^^
ValueError: day is out of range for month

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 801, in __call_with_parser_retry
    result = method(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 768, in _real_stat
    lstat_result = self._real_lstat(path, _exception_for_missing_path)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 729, in _real_lstat
    for stat_result in self._stat_results_from_dir(dirname):
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 636, in _stat_results_from_dir
    stat_result = self._parser.parse_line(line, self._host.time_shift())
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 472, in parse_line
    st_mtime, st_mtime_precision = self.parse_unix_time(
                                   ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 308, in parse_unix_time
    server_utc_datetime = self._datetime(
                          ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 218, in _datetime
    raise ftputil.error.ParserError(
ftputil.error.ParserError: invalid datetime '2023-02-29 17:02:00'
Debugging info: ftputil 5.1.0, Python 3.11.8 (linux)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 549, in parse_line
    st_size = int(dir_or_size)
              ^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'ftp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/project/src/main.py", line 49, in <module>
    main()
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/project/src/main.py", line 36, in main
    is_success = import_orders(sap_environment=sap_environment)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/project/src/functions/import_orders.py", line 429, in import_orders
    is_success_1 = process_folder(sap_grpc_client=sap_grpc_client, ftp_host=ftp_host, folder_dict=folder_dict)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/project/src/functions/import_orders.py", line 372, in process_folder
    object_name_list = ftp_host.listdir(folder_path)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/host.py", line 988, in listdir
    items = self._stat._listdir(path)
            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 825, in _listdir
    return self.__call_with_parser_retry(self._real_listdir, path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 801, in __call_with_parser_retry
    result = method(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 680, in _real_listdir
    if not self._path.isdir(path):
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/path.py", line 154, in isdir
    return self._is_file_system_entity(path, "dir")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/path.py", line 132, in _is_file_system_entity
    stat_result = self._host.stat(path, _exception_for_missing_path=False)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/host.py", line 1024, in stat
    return self._stat._stat(path, _exception_for_missing_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 845, in _stat
    return self.__call_with_parser_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 801, in __call_with_parser_retry
    result = method(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 768, in _real_stat
    lstat_result = self._real_lstat(path, _exception_for_missing_path)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 719, in _real_lstat
    if not self._path.isdir(dirname) and not _exception_for_missing_path:
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/path.py", line 154, in isdir
    return self._is_file_system_entity(path, "dir")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/path.py", line 132, in _is_file_system_entity
    stat_result = self._host.stat(path, _exception_for_missing_path=False)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/host.py", line 1024, in stat
    return self._stat._stat(path, _exception_for_missing_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 845, in _stat
    return self.__call_with_parser_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 811, in __call_with_parser_retry
    return method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 768, in _real_stat
    lstat_result = self._real_lstat(path, _exception_for_missing_path)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 729, in _real_lstat
    for stat_result in self._stat_results_from_dir(dirname):
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 636, in _stat_results_from_dir
    stat_result = self._parser.parse_line(line, self._host.time_shift())
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 551, in parse_line
    raise ftputil.error.ParserError("invalid size {}".format(dir_or_size))
ftputil.error.ParserError: invalid size ftp
Debugging info: ftputil 5.1.0, Python 3.11.8 (linux)

~stacylo 11 months ago

For the time zone, my local machine is UTC+8. The FTP server should be UTC+8 as well, but I am not sure.

The values of the arguments of parse_unix_time are

  • month_abbreviation='Feb'
  • day='29'
  • year_or_time='17:12'
  • time_shift=0.0
  • with_precision=True

~sschwarzer 11 months ago*

Do you know how I can keep the correct line breaks in the markdown?

You can use literal formatting of a code block by using a line with three backticks before and after the code block. I added the backticks in your description and comment.

I think you can also use all the Markdown formatting described in the linked Markdown Guide in Sourcehut ticket descriptions and comments.

~sschwarzer 11 months ago

This looks weird:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/ftputil/stat.py", line 549, in parse_line
    st_size = int(dir_or_size)
              ^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'ftp'

Can you print the whole problematic line before it gets parsed (or tried to be parsed ;-) )?

~sschwarzer 11 months ago

This looks weird: ...

Also, this exception happens in the MS parser, but the invalid datetime is encountered in the unix parser. I haven't checked all details yet, but it could be during a parser retry, where ftputil tries the unix parser first and when it fails, falls back to the MS parser.

Anyway, I'll check all the stacktraces more closely.

Still, the whole failing line could be really helpful.

~stacylo 11 months ago*

Can you print the whole problematic line before it gets parsed (or tried to be parsed ;-) )?

Yes, it is: drwxr-xr-x 1 ftp ftp 0 Feb 29 17:31 .

~sschwarzer 11 months ago*

Yes, it is: drwxr-xr-x 1 ftp ftp 0 Feb 29 17:31 .

Thanks! :-)

The values of the arguments of parse_unix_time are

month_abbreviation='Feb'
day='29'
year_or_time='17:12'
time_shift=0.0
with_precision=True

If you have the time shift set to 0.0 and call listdir at around 11:00 in your timezone, ftputil inspects the datetime "Feb 29 17:31", assumes it can't be in the future and therefore assumes it belongs to the last year. (That's the common interpretation for dates/datetimes without a year in FTP directory listings.) Therefore, ftputil tries to subtract one year to get the actual date, which fails since 2023-02-29 is invalid.

So either the FTP server reports a wrong time for the listed directory/file, the time shift is set incorrectly - or something else that I haven't thought of yet. ;-) But I think the most likely reason is that the server is in a different time zone and you have to set the time shift (server time - UTC time) on your end.

If you have write permissions in some directory on the server, you can use FTPHost.synchronize_times to detect and set the time shift automatically, otherwise you'd have to use FTPHost.set_time_shift. If you can't use synchronize_times and don't know the time zone of the server, you could try using ftplib.FTP.mlsd for the file. However, I don't know how the MLSD command handles time zones / offsets, i.e. if the time is UTC or local (only the former would be useful for you). Moreover, the server might not support the MLSD command.

~sschwarzer 11 months ago*

If you can't find out the actual time shift to use and you're only interested in the names of the directories/files and never in the actual datetimes, it might be enough to set a high time shift, e.g. 24*60*60. In that case, the time from the server should be considered to be in the past and the parsing should succeed. I know it's a hack, but I think that's the best that can be done.

~stacylo 11 months ago

Yeah okay, I think the problem has been resolved.

ftp_host.synchronize_times() does not work for me. It threw the same error.

So I did ftp_host.set_time_shift(time_shift=28800) before ftp_host.listdir(folder_path), and it worked.

It is also a good suggestion to set a high shift because all I need is the list of file names.

Thank you!!!

~sschwarzer 11 months ago

You're welcome! :-)

~sschwarzer REPORTED CLOSED 11 months ago

~lorenzom 11 months ago*

Hello Stefan,

our Python projects which uses ftputil were also affected on Feb 29. It was kind of stressful day. We could mitigate the issue by calling FTPHost.synchronize_time(). I didn't even know this was a thing 😅.

Assuming that I correctly unterstand the logic behind subtracting 1 from current year when the date looks like it's in the future (stat.py:299), I think there is still a bug in the code:

If the date looks like it is in the future (either due to time-sync-issues or whatever reason), we have to assume that the date belongs to the previous year. Fine. Since this logic is supposed to work for any date, it should also work for Feb 29. As seen, it is not possible to just subtract one year on Feb 29 🤣

When the date looks like it is from the future, and this date is Feb 29, should not subtract 1 from current year but find and use the previous leap year instead. This formula always finds the previous leap year for a given year: (year - 1) // 4 * 4. You can run the code here to verify that the formula works correctly: https://gist.github.com/lollo0296/be6b950aabdc899827647d7987304c7d

I wanted to submit a pull request, but I have no clue how to do it on sourcehat. I found a mirror repo on GitHub, but this seems to be very old code. For this reason, I will submit my proposal here:

#Current code (stat.py:299):

if self._datetime(
                server_year, month, day, hour, minute, 0
            ) > server_now.replace(second=0) + datetime.timedelta(seconds=120):
                server_year -= 1

#My change proposal:

if self._datetime(
                server_year, month, day, hour, minute, 0
            ) > server_now.replace(second=0) + datetime.timedelta(seconds=120):
                if month == 2 and day == 29:
                    server_year = (server_year - 1) // 4 * 4 
                else:
                    server_year -= 1

Cheers 🍻

~sschwarzer 11 months ago*

Hi ~lorenzom,

Many thanks for your feedback! :-)

If the date looks like it is in the future (either due to time-sync-issues or whatever reason), we have to assume that the date belongs to the previous year. Fine. Since this logic is supposed to work for any date, it should also work for Feb 29. As seen, it is not possible to just subtract one year on Feb 29 🤣

When the date looks like it is from the future, and this date is Feb 29, should not subtract 1 from current year but find and use the previous leap year instead. This formula always finds the previous leap year for a given year: (year - 1) // 4 * 4.

I think that doesn't really help.

In my opinion, the goal shouldn't be to use the "same" date (same day and month), but to find a "correct" date (or "as correct as possible"). Subtracting four years would obviously give you a date four years in the past whereas in all other cases we go back one year. A slightly better choice, I think, would be to choose a date close to "February 29th, but one year ago". For example, you could pick March 1st of the previous year. This is similar to what some people do whose birthday is on February 29th: in non-leap years, they celebrate their birthday on March 1st.

That said, I think neither of these approaches is ok. Yes, they'd avoid the exception, but we'd use a date that we know is incorrect. On the other hand, the current ftputil code works if the time shift is set correctly.

Now, I suspect that relatively few people read the documentation attentively enough and aren't even aware that they're supposed to set the time shift. That's not really a complaint. Everyone, including me, cuts corners in some situations. Nobody's perfect.

Let's look at the time shift-related use cases which I think are relevant for our discussion:

  • If you want to use methods like upload_if_newer and download_if_newer, you must set a correct time shift. Otherwise these methods won't work correctly. The only exception is if your current time zone is +00:00, in which case the default time shift of 0.0 happens to be correct.
  • You only want to get the directory or file names in a directory, or more generally, you're not interested in the last modification datetime at all. You also don't want to use upload_if_newer or download_if_newer (see the previous case). You also never rely on the datetime returned by FTPHost.stat or similar methods.

For the second use case, we only want to avoid that we get an exception when parsing a directory. In this case, either your approach or mine (picking March 1st) would work.

However, this ignores the first use case. Here, we'd avoid the exception, but we'd use a timestamp we know to be wrong. I can't help it, but in my opinion, that's worse than assuming the time shift is set correctly and use wrong timestamps because of that (although in both cases we use a wrong timestamp, no more, no less).

After thinking about the problem for a while (which is the reason I didn't reply earlier), I think the problem is that since the time shift is set to 0.0 by default, ftputil can't distinguish between these cases:

  • The time shift actually is 0.0, left at the default.
  • The time shift actually is 0.0, and was explicitly set to this value.
  • The time shift was never set, but actually is not 0.0.

At the moment, I think the cleanest way out of this would be:

  • When creating an FTPHost object, set the time shift to a special value like UNSET_TIME_SHIFT.
  • If this default is present, parsing a directory line will set the timestamp to None since no timestamp can be determined. This would support the use case where you're not interested in the timestamps (neither implicitly nor explicitly).
  • If this default is present and you use upload_if_newer or download_if_newer, ftputil will raise an exception, complaining about the unknown time shift.
  • If the time shift is determined with synchronize_times or set explicitly with set_time_shift, ftputil will behave as it does currently: You get timestamps corrected by the time shift and parsing directory entries always works (as long as the time shift is set correctly and the server doesn't actually use timestamps in the future ;-) ).

Using this approach, it would mean that accidental uses of the default time shift 0.0 result in an exception instead of bogus results and behavior. However, in cases where up to now the time shift was intentionally left at or was set to 0.0, this is a breaking change.

Unless we come up with a better idea, I suggest the following:

  • I'll (hopefully soon, but it may take a while) release ftputil 5.2, which, if the time shift is used and is 0.0, will print a deprecation warning, but won't otherwise change the behavior.
  • Later, I'll release ftputil 6.0 (new major version number because I follow SemVer and this is a breaking change) with the UNSET_TIME_SHIFT approach described above.

As I said, this would be a breaking change for users who deliberately used a time shift of 0.0. (Although I think, using set_time_shift explicitly would be cleaner.)

(There are also edge cases where you actually can ignore wrong timestamps. If you have a job on the FTP server that adds new files every week and you have a script that runs (for example) three days after the files were added, time zone differences don't matter. For example, download_if_newer will detect that the remote and local timestamps for the new files are more than a day apart (regardless of the time shift). I only mention this edge case here for completeness. I don't think it should change anything in the chosen release strategy.)

I wanted to submit a pull request, but I have no clue how to do it on sourcehat.

The approach recommended by Sourcehut is described here, but personally I don't mind getting a text diff, assuming the changes aren't too large. And if they're large, it would anyway make a lot of sense to contact me first before working on a PR. :-)

I found a mirror repo on GitHub, but this seems to be very old code.

Sorry about that. The Github repo is supposed to be an up-to-date mirror. I thought I had my Git remotes set up so that git push pushes both to Sourcehut and Github, but I confused this with another repo where I had this set up like that. I fixed the remotes now and pushed the current state of my repository so Sourcehut and Github are in sync.

Regarding PRs on Github, I'd deactivate them if I could, but it seems that's not possible. Unfortunately, you can deactivate issues and wikis, but not PRs.

~sschwarzer 11 months ago

By the way, actually the time shift doesn't specify the time zone offset at the place of the server, but the time zone offset of the timestamps the server outputs in directory listings. In other words, if the server outputs UTC, the time shift must be set to 0.0.

This also means that actually wanting a time shift of 0.0 isn't as rare as I thought because (I hope) nowadays many FTP servers are configured to show UTC timestamps.

That said, this doesn't change anything for the above suggestion for cleaning up the API.

~sschwarzer CLOSED REPORTED 10 months ago

Register here or Log in to comment, or comment via email.