- Add keep-alive connections [...]
- CGI is 1-shot only, so needs a new protocol:
Another option is to daemonize the CGI process, and then connect from new processes that come from the same loader client (e.g. using a session cookie). I doubt it makes things much easier, but who knows.
- Remove libcurl dependency?
We're also getting blocked by some sites (well, Cloudflare) because of curl's header order. There is no way to adjust this (apart from using curl-impersonate), so curl must go.
- need an alternative inflate: could use stbi or miniz. I think miniz is faster, but needs benchmarking. (miniz license also changed from PD to MIT, but the SDL2 version is still PD...)
stbi is out of question, it runs once per IDAT (i.e. can't stream). miniz it is.
For miniz, it seems 1.16 beta was the last release that touched the decompressor, and it is still PD. MIT miniz only has two bugfixes in tinfl. (SDL is still on 1.15 for some reason.)
Better yet, 1.16 beta also has a separate tinfl.h version, which is all we need. It only supports deflate, but it's easy to add gzip.
I've also seen servers send brotli despite not advertising it in Accept-Encoding (<- why we added it to w3m). So libbrotli is needed for best compatibility (and probably zstd too).
We might also want to support layered compression, but I don't know if any servers use it. (Combined with Transfer-Encoding, it gets quite difficult...)
- mbedtls?
- looks like it's smaller than OpenSSL, but idk how well it behaves with OS certs
You have to manually look them up with OpenSSL too, so the question is moot.
Anyway, the options are:
OpenSSL is the most convenient, as we already wrap it for Gemini. But it's is twice as large as the browser itself.
mbedTLS is fairly small, and seems complete enough for web browsing. In practice, TLS fingerprinters (again, Cloudflare) may block it.
NSS is another option, with an acceptable size too. Firefox uses it, so it may be easier to convince TLS fingerprinters that we are a "real" browser. However, it is MPL2 licensed, and its documentation is so bad that even curl dropped it.
I think for now it's best to stick with OpenSSL. We can switch to NSS if we still get blocked after dropping curl, or mbedTLS otherwise.
A WIP version of "newhttp" is now available in the bonus/ directory. Status is tracked in the bonus/README.md file.
For now, its only benefit over the default libcurl client is a different header order (and less dependencies). I know at least one site that stops blocking us with newhttp, but on others it might have the opposite effect...
(Speaking of blocking, we're now being blocked by sourcehut (!), apparently because prepending Mozilla/5.0 to the UA string is a mortal sin. Sigh.)
I've restored the old UA string because of Anubis.
I'm also testing newhttp and have seen better results than with the libcurl client. So I'll probably make it the default sooner than planned. If anybody is reading this, I'd appreciate some help:
- (from the repository root)
cd bonus && make install-newhttp
- Check your favorite websites that worked with the libcurl client and report back whether they work with newhttp.
Update: I've dropped brotli support, as I'm not convinced it's worth it after all. This means you no longer have to modify default-headers to use newhttp; only gzip and deflate are sent in Content-Encoding. It also means that newhttp no longer requires any new dependencies.
Another improvement since the previous post: Content-Length is now respected, so if the server sends more bytes than promised, we ignore those. (Unless chunked transfer encoding is used; in fact, we're just reusing the chunk size indicator...)
Overall, newhttp seems to be working quite well, so if things go according to plan it will switch places with the libcurl client soon. (I'll keep the libcurl client in bonus/ so that building with curl-impersonate remains an option.)
newhttp is now the default HTTP(s) client on master. Goodbye curl :P
(The old client remains available in bonus/ as
make install-curlhttp
.)Next step: do something about persistent connections. I'm thinking a new protocol would be ideal, since neither SCGI nor FastCGI can implement some zero-copy semantics that our CGI currently has (mainly for image decoding).
This could also be a subset of the current protocol between buffers and the loader. Then it could be further developed into an RPC scheme, where CGI can send messages using POST to built-in URLs, or just call into other CGI scripts.