Europe/London
Comment by ~torresjrjr on ~sircmpwn/hare
I think trying to present leapsecs in dates ("23:59:60") is a lost cause.
Upon reflection, its just a shorthand, a hack, which breaks consistency. Instants and Dates are meant to have 1-to-1 correspondence. Having two representations for one UTC instant requires a major set of hacks to the current design which isn't worth it.
To give an idea of such hack, we could introduce another field to date::date{}, say a boolean .leap field, which specifies that the embedded instant, if it were to hold values representing a positive leapsec, is either the "first" or "second" instance of the leapsec.
- How would date::add() work? Would it decide between incrementing .sec or toggling .leap? How would it even known when to check for leapsecs? Semantically, how is adding even reliable when it increments only some of the time?
- How does this generalize to other pairs of timescales with a different form of disambiguation (not leapsecs, something else)?
- How to convert a UTC instant to a date? Do we introduce yet another parameter to from_instant()? What about its error types for when conversions fail?
All of this smells like the cascading result of a fatal design flaw.
If one wants to make a clock which prints ":60" on positive leapsecs, just add to your for-loop a conversion from UTC to TAI and check for the leapsec, then account for it. You'd have to do that anyway, since such a clock would probably use a combination of network and monotonic time sources if it wants to be that accurate. Heck, maybe we could add that to hare-dateutil.
REPORTED
RESOLVED BY_DESIGNComment by ~torresjrjr on ~sircmpwn/hare
Max Schillinger referenced this ticket in commit bf1e316.
Comment by ~torresjrjr on ~sircmpwn/hare
Max Schillinger referenced this ticket in commit 413a5e4.
Comment by ~torresjrjr on ~sircmpwn/hare
Bor Grošelj Simić referenced this ticket in commit 10f02df.
Comment by ~torresjrjr on ~sircmpwn/hare
Bor Grošelj Simić referenced this ticket in commit 8eb08f6.
Comment by ~torresjrjr on ~sircmpwn/hare
Drew DeVault referenced this ticket in commit 72ad406.
Comment by ~torresjrjr on ~sircmpwn/hare
Drew DeVault referenced this ticket in commit 0bcecc5.
Comment by ~torresjrjr on ~sircmpwn/hare
On Sun Mar 10, 2024 at 12:38 AM GMT, ~ecs wrote:
imo the thing that makes sense is to just avoid canonizing any one length measurement in len()
len() isn't special cased for strings. It just does what it always does. It's the string that bends to len()'s will. I don't see that as canonizing. But a footgun? I can see that.
ultimately this is an extension of disallowing indexing on strings: byte indexing is Wrong, rune indexing is expensive and still not necessarily correct, grapheme indexing is expensive and extremely complicated and definitely not a good fit for us, so we just don't have any built-in indexing. the same arguments apply for length measurements, though bytewise length is less wrong than bytewise indexing, so imo it makes sense to have in the stdlib
I do find this point convince, however, that strings are currently this sort of hybird type which, on a builtin level, can act like a slice in some ways ( len(s) ) but not others ( s[i] ), and that is not good.
Having the user explicitly convert to []u8 is good. Having the user use convenience functions which do this for them under the hood, like strings::bytelen(), for this reason, I think is good.
+1 for droping len(string).
On Sun Mar 10, 2024 at 12:17 AM UTC, ~torresjrjr wrote:
On Sat Mar 9, 2024 at 9:53 PM GMT, ~ecs wrote:
ultimately the issue with having a builtin len(str) is that the user should always think carefully before getting any length of a string, and allowing len(str) makes it seem like they can avoid doing that. it also
You'd have to not understand what encodings are to think there is a one-size-fits-all best way to measure strings. Treating the programmer as if they don't know what encodings or graphemes are is a weird level of coddling. It's reasonable to expect them to know this.
that's exactly my point: there's no one-size-fits-all way to measure strings, and having a len() implies that there is. getting rid of footguns isn't "coddling", or at least, it's not out of line with hare's design philosophy, and imo len(str), no matter what unit it's in, is a footgun
sends the message that byte length is a more useful measure of strings,
When a Hare programmer is introduced to len(), they are taught that it "returns the length of a given slice/array". When they are introduced Hare strings, they're given the understanding that they are []u8 under the hood. They put two and two together: You give len() a string, you get a byte-length.
sure, len() being the byte length is more consistent, and makes sense from a language design perspective. the issue is that it's also a footgun, because it makes subtly-wrong semantics look like the simplest ones
a broader theme in hare's design philosophy which applies here is that the easy thing should, by default, be the right thing. propagating errors is usually the right thing, so we made it the easy thing. making sure you don't have out-of-bounds accesses is usually the right thing, so we made it the easy thing. there're always cases outside of that "usually", which is why we have things like unbounded arrays, but we try to make sure that they're not the first thing someone reaches for unless they're absolutely certain
len(str) breaks this, because it's the easy thing to do but byte length isn't what's usually right here, because there isn't any unambiguous "usually right". non-byte lengths definitely can't be builtins for time complexity reasons, byte length is almost always irrelevant in a unicode context, and so imo the thing that makes sense is to just avoid canonizing any one length measurement in len()
ultimately this is an extension of disallowing indexing on strings: byte indexing is Wrong, rune indexing is expensive and still not necessarily correct, grapheme indexing is expensive and extremely complicated and definitely not a good fit for us, so we just don't have any built-in indexing. the same arguments apply for length measurements, though bytewise length is less wrong than bytewise indexing, so imo it makes sense to have in the stdlib
(also, users should usually be iterating over strings rather than getting their length
Depends on the problem you're solving.
for (let i = 0z; i < len(some_str); i += 1) is pretty much never a thing you want to do, and if it is, it's perfectly reasonable to need to explicitly opt into bytewise length via strings::bytelen()
Comment by ~torresjrjr on ~sircmpwn/hare
On Sat Mar 9, 2024 at 9:53 PM GMT, ~ecs wrote:
ultimately the issue with having a builtin len(str) is that the user should always think carefully before getting any length of a string, and allowing len(str) makes it seem like they can avoid doing that. it also
You'd have to not understand what encodings are to think there is a one-size-fits-all best way to measure strings. Treating the programmer as if they don't know what encodings or graphemes are is a weird level of coddling. It's reasonable to expect them to know this.
We can at a minimun mention these topics and document [[strings::runelen]] alongside len() in the intro tutorial.
sends the message that byte length is a more useful measure of strings,
When a Hare programmer is introduced to len(), they are taught that it "returns the length of a given slice/array". When they are introduced Hare strings, they're given the understanding that they are []u8 under the hood. They put two and two together: You give len() a string, you get a byte-length.
(also, users should usually be iterating over strings rather than getting their length
Depends on the problem you're solving.