sr.ht's source tarballs may not be deterministically generated

git-archive has a stable output format to .tar, but afterward, hands things off to an external compressor: namely, the gzip binary. Let's compare some different downloads across source hosting platforms:

https://github.com/archlinux/devtools/archive/20190821.tar.gz https://git.archlinux.org/devtools.git/snapshot/devtools-20190821.tar.gz https://gitlab.com/eschwartz/devtools/-/archive/20190821/devtools-20190821.tar.gz https://git.sr.ht/~eschwartz/devtools/archive/20190821.tar.gz

$ sha256sum *devtools-20190821.tar.gz
4557e5db0225db0aab0d26b853907b3308037a05231519a69f7882ee2168b3b3  cgit-devtools-20190821.tar.gz
4557e5db0225db0aab0d26b853907b3308037a05231519a69f7882ee2168b3b3  github-devtools-20190821.tar.gz
4557e5db0225db0aab0d26b853907b3308037a05231519a69f7882ee2168b3b3  gitlab-devtools-20190821.tar.gz
fe222eb819bf0dd410ab6a3201fc196961746e3b2f1866dae5ca5d27142da208  srht-devtools-20190821.tar.gz

One of these tarballs is not like the others! However, the underlying tar is the same.

$ gzip -dk github-devtools-20190821.tar.gz srht-devtools-20190821.tar.gz
$ sha256sum *devtools-20190821.tar
528100dae1d0c2a4747b43b818e6a8776dc66723afcba33e615baac9874eac77  github-devtools-20190821.tar
528100dae1d0c2a4747b43b818e6a8776dc66723afcba33e615baac9874eac77  srht-devtools-20190821.tar

Seems like sr.ht is hosted on alpine with the gzip binary provided by busybox. ssh'ing into a builds.sr.ht alpine image and using gzip -n on the .tar reveals this busybox build reproduces the same tarball. So this is where git.sr.ht is getting the unusual output.

Does busybox guarantee a stable output? It is certainly not generating the exact same bytes as GNU gzip is.

More worryingly, I cannot generate this output on my Arch Linux laptop. My busybox gzip -n, produces the following sha256sum: 4449fda607906c232ba753c9a5b3299ce4b14750aab1ad1da65a3f774df43a8b

It seems like to at least some extent, what output you get from busybox gzip will depend on which version and/or build of busybox you have. Maybe it would be better to require sr.ht to be hosted on a system with a non-busybox build.

Assigned to
5 months ago
4 months ago
No labels applied.

~eschwartz 5 months ago

This has interesting applications for https://todo.sr.ht/~sircmpwn/git.sr.ht/231, because if the gzip compressor is unreliable it may be better to advise users to sign their sources via git notes --ref=refs/notes/signatures/tar, not tar.gz (of course, an argument could be made that that is more advisable even without this).

~eschwartz 5 months ago

See http://lists.busybox.net/pipermail/busybox/2019-September/087449.html

Will need to double-check to make sure reality plays out as expected, but the next major.minor busybox release should ensure that all busybox-generated gzip files have invalidated checksums, and instead align with what GNU gzip creates (where things will hopefully remain permanently).

~eschwartz 4 months ago

So it turns out that this is generally an issue that also causes https://github.com/swaywm/sway/issues/4603

Register here or Log in to comment, or comment via email.