Hello,
I'm currently switching from neomutt to aerc, and here's one of the first problems I have:
The mailcap entry used by neomutt is
text/html; bwrap_auto.bash w3m -I %{charset} -T text/html --; copiousoutput
and text/html=w3m -dump -T text/html
for the aerc filter.
Since the filters command are actually passed to sh, why not just set an AERC_CONTENT_CHARSET environment variable? Or always convert the filter input to UTF-8 and document it.
~q3cpma, Jul 30, 2022 at 09:44:
Since the filters command are actually passed to sh, why not just set an AERC_CONTENT_CHARSET environment variable? Or always convert the filter input to UTF-8 and document it.
I have been thinking about exporting a few variables to filters. Something like this should be a starting point:
diff --git a/widgets/msgviewer.go b/widgets/msgviewer.go index eec786c507a9..dd37ddcee999 100644 --- a/widgets/msgviewer.go +++ b/widgets/msgviewer.go @@ -4,6 +4,7 @@ import ( "bufio" "errors" "fmt" + "internal/syscall/execenv" "io" "os" "os/exec" @@ -547,9 +548,8 @@ func NewPartViewer(acct *AccountView, conf *config.AercConfig, pager = exec.Command(cmd[0], cmd[1:]...) info := msg.MessageInfo() + mime := strings.ToLower(part.MIMEType) + "/" + strings.ToLower(part.MIMESubType) for _, f := range conf.Filters { - mime := strings.ToLower(part.MIMEType) + - "/" + strings.ToLower(part.MIMESubType) switch f.FilterType { case config.FILTER_MIMETYPE: if fnmatch.Match(f.Filter, mime, 0) { @@ -578,6 +578,17 @@ func NewPartViewer(acct *AccountView, conf *config.AercConfig, } } if filter != nil { + env = execenv.Default(filter.SysProcAttr) + env = append(env, fmt.Sprintf("AERC_ACCOUNT=%s", acct.acct.Name)) + env = append(env, fmt.Sprintf("AERC_CONTENT_TYPE=%s", mime)) + if charset, found := part.Params["charset"]; fount { + env = append(env, fmt.Sprintf("AERC_CONTENT_CHARSET=%s", charset)) + } + if part.Encoding != "" { + env = append(env, fmt.Sprintf("AERC_CONTENT_ENCODING=%s", part.Encoding)) + } + filter.Env = env + if pipe, err = filter.StdinPipe(); err != nil { return nil, err }Do you want to apply this, test it and submit patch?
On Sat, Jul 30, 2022 at 03:37:24PM +0000, ~rjarry wrote:
~q3cpma, Jul 30, 2022 at 09:44:
Since the filters command are actually passed to sh, why not just set an AERC_CONTENT_CHARSET environment variable? Or always convert the filter input to UTF-8 and document it.
I have been thinking about exporting a few variables to filters. Something like this should be a starting point:
diff --git a/widgets/msgviewer.go b/widgets/msgviewer.go index eec786c507a9..dd37ddcee999 100644 --- a/widgets/msgviewer.go +++ b/widgets/msgviewer.go @@ -4,6 +4,7 @@ import ( "bufio" "errors" "fmt" + "internal/syscall/execenv" "io" "os" "os/exec" @@ -547,9 +548,8 @@ func NewPartViewer(acct *AccountView, conf *config.AercConfig, pager = exec.Command(cmd[0], cmd[1:]...) info := msg.MessageInfo() + mime := strings.ToLower(part.MIMEType) + "/" + strings.ToLower(part.MIMESubType) for _, f := range conf.Filters { - mime := strings.ToLower(part.MIMEType) + - "/" + strings.ToLower(part.MIMESubType) switch f.FilterType { case config.FILTER_MIMETYPE: if fnmatch.Match(f.Filter, mime, 0) { @@ -578,6 +578,17 @@ func NewPartViewer(acct *AccountView, conf *config.AercConfig, } } if filter != nil { + env = execenv.Default(filter.SysProcAttr) + env = append(env, fmt.Sprintf("AERC_ACCOUNT=%s", acct.acct.Name)) + env = append(env, fmt.Sprintf("AERC_CONTENT_TYPE=%s", mime)) + if charset, found := part.Params["charset"]; fount { + env = append(env, fmt.Sprintf("AERC_CONTENT_CHARSET=%s", charset)) + } + if part.Encoding != "" { + env = append(env, fmt.Sprintf("AERC_CONTENT_ENCODING=%s", part.Encoding)) + } + filter.Env = env + if pipe, err = filter.StdinPipe(); err != nil { return nil, err }Do you want to apply this, test it and submit patch?
-- View on the web: https://todo.sr.ht/~rjarry/aerc/65#event-191586
Sadly, I get
widgets/msgviewer.go:7:2: use of internal package internal/syscall/execenv not allowed
when doing a make. Completely new to Go, by the way.
The patch needed some changes. I have sent it on the mailing list:
https://lists.sr.ht/~rjarry/aerc-devel/patches/34308
Feel free to test it and comment.
On Sat, Jul 30, 2022 at 04:34:36PM +0000, ~rjarry wrote:
The patch needed some changes. I have sent it on the mailing list:
The environment is set properly and the charset variable has the correct value, but it looks like the filter input is mangled from the beginning:
I use this script as HTML filter to debug:
#!/bin/sh -eu tempfile=$(mktemp) stdin=$(tee "$tempfile") trap 'exit 1' HUP INT QUIT ABRT ALRM TERM trap 'rm "$tempfile"' EXIT { date echo 'STDIN:' echo "$stdin" echo "AERC_CONTENT_TYPE: $AERC_CONTENT_TYPE" echo "AERC_CONTENT_CHARSET: $AERC_CONTENT_CHARSET" echo "uchardet result: $(uchardet "$tempfile")" echo } >>~/filter_html.log echo "$stdin" | w3m -dump -I "$AERC_CONTENT_CHARSET" -T text/html
which gives me
Sat Jul 30 07:48:45 PM CEST 2022 STDIN: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="htt> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> ... AERC_CONTENT_TYPE: text/html AERC_CONTENT_CHARSET: iso-8859-1 uchardet result: UTF-8
Let me investigate a bit what mutt gives with this.
Using a mailcap with
text/html; env AERC_CONTENT_CHARSET=%{charset} ~/filter_html.sh
Neomutt gives me
... AERC_CONTENT_TYPE: empty AERC_CONTENT_CHARSET: iso-8859-1 uchardet result: ISO-8859-1
with actual Latin-1 content (e.g. 0xe9 for é).
PS: I guess it's fortunate that w3m -I '' works fine
Using
text/html=w3m -dump -I UTF-8 -T text/html
works fine, so maybe we can assume UTF-8 as input always? w3m was originally choking on it because the header announced latin-1, I guess.
On Sat Jul 30, 2022 at 8:07 PM CEST, ~q3cpma wrote:
Using
text/html=w3m -dump -I UTF-8 -T text/html
works fine, so maybe we can assume UTF-8 as input always? w3m was originally choking on it because the header announced latin-1, I guess.Yes, I think that's correct. I'm almost a 100% that aerc always provides UTF-8 data to the filters since we use the go-message 0 package internally which automatically decodes the charset to UTF-8.
-- View on the web: https://todo.sr.ht/~rjarry/aerc/65#event-191643
It may be wrong from a correctness point of view, though, as the HTML announced charset may now be false.