sr.ht's source tarballs may not be deterministically generated

git-archive has a stable output format to .tar, but afterward, hands things off to an external compressor: namely, the gzip binary. Let's compare some different downloads across source hosting platforms:

https://github.com/archlinux/devtools/archive/20190821.tar.gz https://git.archlinux.org/devtools.git/snapshot/devtools-20190821.tar.gz https://gitlab.com/eschwartz/devtools/-/archive/20190821/devtools-20190821.tar.gz https://git.sr.ht/~eschwartz/devtools/archive/20190821.tar.gz

$ sha256sum *devtools-20190821.tar.gz
4557e5db0225db0aab0d26b853907b3308037a05231519a69f7882ee2168b3b3  cgit-devtools-20190821.tar.gz
4557e5db0225db0aab0d26b853907b3308037a05231519a69f7882ee2168b3b3  github-devtools-20190821.tar.gz
4557e5db0225db0aab0d26b853907b3308037a05231519a69f7882ee2168b3b3  gitlab-devtools-20190821.tar.gz
fe222eb819bf0dd410ab6a3201fc196961746e3b2f1866dae5ca5d27142da208  srht-devtools-20190821.tar.gz

One of these tarballs is not like the others! However, the underlying tar is the same.

$ gzip -dk github-devtools-20190821.tar.gz srht-devtools-20190821.tar.gz
$ sha256sum *devtools-20190821.tar
528100dae1d0c2a4747b43b818e6a8776dc66723afcba33e615baac9874eac77  github-devtools-20190821.tar
528100dae1d0c2a4747b43b818e6a8776dc66723afcba33e615baac9874eac77  srht-devtools-20190821.tar

Seems like sr.ht is hosted on alpine with the gzip binary provided by busybox. ssh'ing into a builds.sr.ht alpine image and using gzip -n on the .tar reveals this busybox build reproduces the same tarball. So this is where git.sr.ht is getting the unusual output.

Does busybox guarantee a stable output? It is certainly not generating the exact same bytes as GNU gzip is.

More worryingly, I cannot generate this output on my Arch Linux laptop. My busybox gzip -n, produces the following sha256sum: 4449fda607906c232ba753c9a5b3299ce4b14750aab1ad1da65a3f774df43a8b

It seems like to at least some extent, what output you get from busybox gzip will depend on which version and/or build of busybox you have. Maybe it would be better to require sr.ht to be hosted on a system with a non-busybox build.

Assigned to
20 days ago
14 days ago
No labels applied.

~eschwartz 20 days ago

This has interesting applications for https://todo.sr.ht/~sircmpwn/git.sr.ht/231, because if the gzip compressor is unreliable it may be better to advise users to sign their sources via git notes --ref=refs/notes/signatures/tar, not tar.gz (of course, an argument could be made that that is more advisable even without this).

~eschwartz 14 days ago

See http://lists.busybox.net/pipermail/busybox/2019-September/087449.html

Will need to double-check to make sure reality plays out as expected, but the next major.minor busybox release should ensure that all busybox-generated gzip files have invalidated checksums, and instead align with what GNU gzip creates (where things will hopefully remain permanently).

Register here or Log in to comment, or comment via email.