~torresjrjr

Europe/London

https://torresjrjr.com/

Trackers

~torresjrjr/lateximgbot

Last active 3 years ago

~torresjrjr/Bezier.py

Last active 3 years ago

~torresjrjr/go-nestedtext

Last active 3 years ago

~torresjrjr/fetch

Last active 3 years ago

~torresjrjr/birck.vim

Last active 3 years ago

~torresjrjr/pdssg

Last active 3 years ago

~torresjrjr/gemini.vim

Last active 3 years ago

~torresjrjr/linkchanbot

Last active 3 years ago

~torresjrjr/dotfiles

Last active 3 years ago

#649 time: handle UTC leapsecond representation 17 days ago

Comment by ~torresjrjr on ~sircmpwn/hare

I think trying to present leapsecs in dates ("23:59:60") is a lost cause.

Upon reflection, its just a shorthand, a hack, which breaks consistency. Instants and Dates are meant to have 1-to-1 correspondence. Having two representations for one UTC instant requires a major set of hacks to the current design which isn't worth it.

To give an idea of such hack, we could introduce another field to date::date{}, say a boolean .leap field, which specifies that the embedded instant, if it were to hold values representing a positive leapsec, is either the "first" or "second" instance of the leapsec.

  • How would date::add() work? Would it decide between incrementing .sec or toggling .leap? How would it even known when to check for leapsecs? Semantically, how is adding even reliable when it increments only some of the time?
  • How does this generalize to other pairs of timescales with a different form of disambiguation (not leapsecs, something else)?
  • How to convert a UTC instant to a date? Do we introduce yet another parameter to from_instant()? What about its error types for when conversions fail?

All of this smells like the cascading result of a fatal design flaw.

If one wants to make a clock which prints ":60" on positive leapsecs, just add to your for-loop a conversion from UTC to TAI and check for the leapsec, then account for it. You'd have to do that anyway, since such a clock would probably use a combination of network and monotonic time sources if it wants to be that accurate. Heck, maybe we could add that to hare-dateutil.

REPORTED RESOLVED BY_DESIGN

#696 regex: Add multiple alternation 7 months ago

Comment by ~torresjrjr on ~sircmpwn/hare

Max Schillinger referenced this ticket in commit bf1e316.

#695 regex: Add whole-expression alternation 7 months ago

Comment by ~torresjrjr on ~sircmpwn/hare

Max Schillinger referenced this ticket in commit 413a5e4.

#820 What happens to objects after an action on that object errors out? 7 months ago

Comment by ~torresjrjr on ~sircmpwn/hare

Bor Grošelj Simić referenced this ticket in commit 10f02df.

#819 streams that embed other streams shouldn't use static buffers 7 months ago

Comment by ~torresjrjr on ~sircmpwn/hare

Bor Grošelj Simić referenced this ticket in commit 8eb08f6.

#935 debug::backtrace doesn't work with +libc binaries 8 months ago

Comment by ~torresjrjr on ~sircmpwn/hare

Drew DeVault referenced this ticket in commit 72ad406.

#931 Build fails on aarch64 due to missing syscalls 8 months ago

Comment by ~torresjrjr on ~sircmpwn/hare

Drew DeVault referenced this ticket in commit 0bcecc5.

#930 disallow len() on strings 9 months ago

Comment by ~torresjrjr on ~sircmpwn/hare

On Sun Mar 10, 2024 at 12:38 AM GMT, ~ecs wrote:

imo the thing that makes sense is to just avoid canonizing any one length measurement in len()

len() isn't special cased for strings. It just does what it always does. It's the string that bends to len()'s will. I don't see that as canonizing. But a footgun? I can see that.

ultimately this is an extension of disallowing indexing on strings: byte indexing is Wrong, rune indexing is expensive and still not necessarily correct, grapheme indexing is expensive and extremely complicated and definitely not a good fit for us, so we just don't have any built-in indexing. the same arguments apply for length measurements, though bytewise length is less wrong than bytewise indexing, so imo it makes sense to have in the stdlib

I do find this point convince, however, that strings are currently this sort of hybird type which, on a builtin level, can act like a slice in some ways ( len(s) ) but not others ( s[i] ), and that is not good.

Having the user explicitly convert to []u8 is good. Having the user use convenience functions which do this for them under the hood, like strings::bytelen(), for this reason, I think is good.

+1 for droping len(string).

#930 disallow len() on strings 9 months ago

on ~sircmpwn/hare

On Sun Mar 10, 2024 at 12:17 AM UTC, ~torresjrjr wrote:

On Sat Mar 9, 2024 at 9:53 PM GMT, ~ecs wrote:

ultimately the issue with having a builtin len(str) is that the user should always think carefully before getting any length of a string, and allowing len(str) makes it seem like they can avoid doing that. it also

You'd have to not understand what encodings are to think there is a one-size-fits-all best way to measure strings. Treating the programmer as if they don't know what encodings or graphemes are is a weird level of coddling. It's reasonable to expect them to know this.

that's exactly my point: there's no one-size-fits-all way to measure strings, and having a len() implies that there is. getting rid of footguns isn't "coddling", or at least, it's not out of line with hare's design philosophy, and imo len(str), no matter what unit it's in, is a footgun

sends the message that byte length is a more useful measure of strings,

When a Hare programmer is introduced to len(), they are taught that it "returns the length of a given slice/array". When they are introduced Hare strings, they're given the understanding that they are []u8 under the hood. They put two and two together: You give len() a string, you get a byte-length.

sure, len() being the byte length is more consistent, and makes sense from a language design perspective. the issue is that it's also a footgun, because it makes subtly-wrong semantics look like the simplest ones

a broader theme in hare's design philosophy which applies here is that the easy thing should, by default, be the right thing. propagating errors is usually the right thing, so we made it the easy thing. making sure you don't have out-of-bounds accesses is usually the right thing, so we made it the easy thing. there're always cases outside of that "usually", which is why we have things like unbounded arrays, but we try to make sure that they're not the first thing someone reaches for unless they're absolutely certain

len(str) breaks this, because it's the easy thing to do but byte length isn't what's usually right here, because there isn't any unambiguous "usually right". non-byte lengths definitely can't be builtins for time complexity reasons, byte length is almost always irrelevant in a unicode context, and so imo the thing that makes sense is to just avoid canonizing any one length measurement in len()

ultimately this is an extension of disallowing indexing on strings: byte indexing is Wrong, rune indexing is expensive and still not necessarily correct, grapheme indexing is expensive and extremely complicated and definitely not a good fit for us, so we just don't have any built-in indexing. the same arguments apply for length measurements, though bytewise length is less wrong than bytewise indexing, so imo it makes sense to have in the stdlib

(also, users should usually be iterating over strings rather than getting their length

Depends on the problem you're solving.

for (let i = 0z; i < len(some_str); i += 1) is pretty much never a thing you want to do, and if it is, it's perfectly reasonable to need to explicitly opt into bytewise length via strings::bytelen()

#930 disallow len() on strings 9 months ago

Comment by ~torresjrjr on ~sircmpwn/hare

On Sat Mar 9, 2024 at 9:53 PM GMT, ~ecs wrote:

ultimately the issue with having a builtin len(str) is that the user should always think carefully before getting any length of a string, and allowing len(str) makes it seem like they can avoid doing that. it also

You'd have to not understand what encodings are to think there is a one-size-fits-all best way to measure strings. Treating the programmer as if they don't know what encodings or graphemes are is a weird level of coddling. It's reasonable to expect them to know this.

We can at a minimun mention these topics and document [[strings::runelen]] alongside len() in the intro tutorial.

sends the message that byte length is a more useful measure of strings,

When a Hare programmer is introduced to len(), they are taught that it "returns the length of a given slice/array". When they are introduced Hare strings, they're given the understanding that they are []u8 under the hood. They put two and two together: You give len() a string, you get a byte-length.

(also, users should usually be iterating over strings rather than getting their length

Depends on the problem you're solving.