For reproducibility downstream needs archives that don't change on every download.
$ curl -sO https://hg.sr.ht/%7Esircmpwn/hg.sr.ht/archive/0.17.0.tar.gz
$ sha256 0.17.0.tar.gz
SHA256 (0.17.0.tar.gz) = 2e08230afef59cda9472516306c7ba2376ba10a0d90672513fb091fc6c0c61f8
$ tar xzf 0.17.0.tar.gz -C before
$ curl -sO https://hg.sr.ht/%7Esircmpwn/hg.sr.ht/archive/0.17.0.tar.gz
$ sha256 0.17.0.tar.gz
SHA256 (0.17.0.tar.gz) = 04e6a3d4313fe333021dc03b88b74b2c220951c23f5c25c5bda11e361879393c
$ tar xzf 0.17.0.tar.gz -C after
$ diff -U8 <(mtree -K sha256 -cip before) <(mtree -K sha256 -cip after)
--- before
+++ after
@@ -1,19 +1,19 @@
# user: holo
# machine: raphael.local
-# tree: /tmp/before
+# tree: /tmp/after
# date: Wed Nov 20 22:53:06 2019
# .
/set type=file uid=1234 gid=0 mode=0755 nlink=1 flags=none
-. type=dir nlink=3 time=1574289857.738323359
+. type=dir nlink=3 time=1574290037.232710825
# ./hg.sr.ht-0.17.0
-hg.sr.ht-0.17.0 type=dir nlink=5 time=1574289857.739530469
+hg.sr.ht-0.17.0 type=dir nlink=5 time=1574290037.233607731
.hg_archival.txt \
mode=0644 size=122 time=1573509795.000000000 \
sha256digest=7323b7a7d3ba0752a3c4f74aad5c525640fc4910af99e30c62d1438899739035
.hgignore mode=0644 size=151 time=1573509795.000000000 \
sha256digest=4e05b8e612cfc5b63606c462f1f1195a192415c45362e5d84adaa33d0ca1efa8
.hgtags mode=0644 size=2186 time=1573509795.000000000 \
sha256digest=b05a4536b207ec700d18363f482bee811c85620172135151d36a5a649a2fc84d
LICENSE mode=0644 size=34520 time=1573509795.000000000 \
@@ -51,30 +51,30 @@ hg.sr.ht-0.17.0 type=dir nlink=5 time=1574289857.73953
run.py size=1904 time=1573509795.000000000 \
sha256digest=d27b3e49cc4e247a0320edcba85e70900a1d74a5265319ac92d8f59b72bed458
setup.py size=2150 time=1573509795.000000000 \
sha256digest=b5107e6dec1687f02aa77c14a36ab1fe9f066fef0bf8271680a8039d4d841734
static type=link time=1573509795.000000000 link=hgsrht/static/
# ./hg.sr.ht-0.17.0/.builds
/set type=file uid=1234 gid=0 mode=0644 nlink=1 flags=none
-.builds type=dir mode=0755 nlink=2 time=1574289857.738408742
+.builds type=dir mode=0755 nlink=2 time=1574290037.232766081
alpine.yml size=1536 time=1573509795.000000000 \
sha256digest=3ad1901a4de957607c1bc4415d9d1a03068104a24b059c596aad982aecfc50b9
archlinux.yml \
size=1142 time=1573509795.000000000 \
sha256digest=e140d31900cf7bff2fa750fa428a0c37fe575f3343402fed43324b4a07ca005e
debian.yml size=1338 time=1573509795.000000000 \
sha256digest=05a69aadba06948ef7df6ed19ee46ff8eaaac8a4101cbccdd7ed07577be56583
# ./hg.sr.ht-0.17.0/.builds
..
# ./hg.sr.ht-0.17.0/hgsrht
-hgsrht type=dir mode=0755 nlink=8 time=1574289857.739467868
+hgsrht type=dir mode=0755 nlink=8 time=1574290037.233568207
app.py size=1703 time=1573509795.000000000 \
sha256digest=e6f5fd9df4feb76c5cd3eca2ff0fb8e4971284d1a15695d2bac8b421bdeb8440
hg.py size=3812 time=1573509795.000000000 \
sha256digest=8d265bb2d223ae23c9836fbe8b814168b2ae4d0d98c8470796780b179703c419
hgwebshim.py \
size=1837 time=1573509795.000000000 \
sha256digest=fae7788623375336cc8003ebbe78cfaa439fba60e63df61949209474ab9dce4b
repos.py size=2980 time=1573509795.000000000 \
@@ -82,25 +82,25 @@ hgsrht type=dir mode=0755 nlink=8 time=157428
service.py size=370 time=1573509795.000000000 \
sha256digest=ffd092eb51e6fbe362e1f73363c0a709a5c6bb62ce470e89de482eae18885333
submit.py size=3337 time=1573509795.000000000 \
sha256digest=36cfc976ce3adf5ea16db46cf93f96e2c96f5320f9f1de242ebfc359e185aca0
webhooks.py size=414 time=1573509795.000000000 \
sha256digest=f5af9a9be16ed8c2741d02e0f41a4258fc136776a9aa9fa0868af96165f15fb8
# ./hg.sr.ht-0.17.0/hgsrht/alembic
-alembic type=dir mode=0755 nlink=3 time=1574289857.738772507
+alembic type=dir mode=0755 nlink=3 time=1574290037.233071376
env.py size=72 time=1573509795.000000000 \
sha256digest=47ccfa69be3a0e4b609a69b8e60e5effaa29cee27f7ef182b5e10b83d1b46e52
script.py.mako \
size=412 time=1573509795.000000000 \
sha256digest=0fc905238e3ff6f04966b0184a46710d35f6f92e58fa811eb4477d04a968f52f
# ./hg.sr.ht-0.17.0/hgsrht/alembic/versions
-versions type=dir mode=0755 nlink=2 time=1574289857.738848287
+versions type=dir mode=0755 nlink=2 time=1574290037.233133204
07d78f270a70_add_user_webhook_table.py \
size=1815 time=1573509795.000000000 \
sha256digest=86e3e7b131a0077759c7830c7f2ecc38cd60e69ff7b40b8698c176cbdb27566e
43fff2508875_add_source_repo_id_to_repository.py \
size=597 time=1573509795.000000000 \
sha256digest=aee6864bf3c7f15ec857a9e83096974bccf47ebf81816577a5f5df30885d9fcc
70bea64c0008_add_ssh_key_table.py \
size=697 time=1573509795.000000000 \
@@ -114,53 +114,53 @@ versions type=dir mode=0755 nlink=2 time=157428
# ./hg.sr.ht-0.17.0/hgsrht/alembic/versions
..
# ./hg.sr.ht-0.17.0/hgsrht/alembic
..
# ./hg.sr.ht-0.17.0/hgsrht/blueprints
-blueprints type=dir mode=0755 nlink=2 time=1574289857.738942912
+blueprints type=dir mode=0755 nlink=2 time=1574290037.233212256
__init__.py size=0 time=1573509795.000000000 \
sha256digest=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
extramanage.py \
size=1687 time=1573509795.000000000 \
sha256digest=a0c2f842dc55b1cd19ae98d7f0e71e74b442d6394481ee47dfa36da5326161bb
internal.py size=3722 time=1573509795.000000000 \
sha256digest=eeb78d31902c837b77edc632b8595d42ce5b8cf9bf4237cb4dc0fef73aae9162
repo.py size=26143 time=1573509795.000000000 \
sha256digest=23a62e479a8deb7d05d8e4469e8465694a06d93fc3f7e755ed945cf943038438
stats.py size=1997 time=1573509795.000000000 \
sha256digest=6c539115a3e6ae146803019703e9d11ad8b5f33020adfb4e8c3e172c86bd1cfa
# ./hg.sr.ht-0.17.0/hgsrht/blueprints
..
# ./hg.sr.ht-0.17.0/hgsrht/hgext
-hgext type=dir mode=0755 nlink=2 time=1574289857.739109314
+hgext type=dir mode=0755 nlink=2 time=1574290037.233342107
__init__.py size=2598 time=1573509795.000000000 \
sha256digest=a539459733c5caed8764d7fea0846f91d65ede78d35b5ef4b6daf1996e1c2cf1
# ./hg.sr.ht-0.17.0/hgsrht/hgext
..
# ./hg.sr.ht-0.17.0/hgsrht/hgrcs
-hgrcs type=dir mode=0755 nlink=2 time=1574289857.739148434
+hgrcs type=dir mode=0755 nlink=2 time=1574290037.233368844
global.cfg size=64 time=1573509795.000000000 \
sha256digest=9b946f0fe59cf789167a227ac3a5a556e6ba75398134c00cc962dacbc486006c
nonpublishing.cfg \
size=55 time=1573509795.000000000 \
sha256digest=6785539811e9c661ecc436be9b4c10feec05e75ba9785af4b402557b2af7b7f8
# ./hg.sr.ht-0.17.0/hgsrht/hgrcs
..
# ./hg.sr.ht-0.17.0/hgsrht/templates
-templates type=dir mode=0755 nlink=2 time=1574289857.739414544
+templates type=dir mode=0755 nlink=2 time=1574290037.233534144
bookmarks.html \
size=103 time=1573509795.000000000 \
sha256digest=f2f4a381b07ed343f495c54319a309be9889615f3c1fcee444500afcce74e0a1
branches.html \
size=103 time=1573509795.000000000 \
sha256digest=f2f4a381b07ed343f495c54319a309be9889615f3c1fcee444500afcce74e0a1
dashboard.html \
size=280 time=1573509795.000000000 \
@@ -186,30 +186,30 @@ templates type=dir mode=0755 nlink=2 time=157428
sha256digest=63cedca68a50f2b348ed31ad9e4988cd1745509ce6bcb780c864e8fdeba8ffc8
tags.html size=105 time=1573509795.000000000 \
sha256digest=f5177a9dd9d0f4ef014ccacb4b4f280f9d54fdb9875b0a4c992ced51850b051f
# ./hg.sr.ht-0.17.0/hgsrht/templates
..
# ./hg.sr.ht-0.17.0/hgsrht/types
-types type=dir mode=0755 nlink=2 time=1574289857.739453373
+types type=dir mode=0755 nlink=2 time=1574290037.233558622
__init__.py size=732 time=1573509795.000000000 \
sha256digest=410b923ebd6e43ef3cacc1eaba62d225385065d41f09f0b08e67790924c31385
sshkey.py size=631 time=1573509795.000000000 \
sha256digest=eb212a64002703a8cb2d56bfc17e707aa44a8d1caba5f73b2b90c496490d14cb
# ./hg.sr.ht-0.17.0/hgsrht/types
..
# ./hg.sr.ht-0.17.0/hgsrht
..
# ./hg.sr.ht-0.17.0/scss
-scss type=dir mode=0755 nlink=2 time=1574289857.739497039
+scss type=dir mode=0755 nlink=2 time=1574290037.233589167
main.scss size=2987 time=1573509795.000000000 \
sha256digest=758d8e58007980a0cd6f5fa66798ea5e55732ff5de1071988162ac6c8cdf9d02
# ./hg.sr.ht-0.17.0/scss
..
# ./hg.sr.ht-0.17.0
..
From discussion on lists:
Looks like "-n" needs to be added to gzip(1):
-n, --no-name This option stops the filename and timestamp from being stored in the output file.
Is anything happening with regards to this bug? It makes it impossible to create stable Gentoo packages, since we need reproducible tarballs.
I would like to gently remind that this is still an issue for distributions packaging software hosted here. Given that the fix should be relatively simple (adding the
-n
parameter to the call ofgzip
), it would be nice if this could be addressed.
Unless I'm more blind than usual, the fix has required and still requires patching the upstream mercurial server, since the actual packaging call is
hg_repo.client.archive(path.encode(), rev=rev, prefix=basename, type="tgz")and mercurial uses a homebrew archiver (https://www.mercurial-scm.org/repo/hg/file/d42809b6b10f/mercurial/archival.py#l134), which shouldn't be too difficult to fix if you have a repro, which it looks like you do.
Thanks for the pointer. I'm pretty unfamiliar with
hg
as I usually work with git, but to me it looks like setting themtime
parameter in the call toarchive()
to the timestamp of the last change would fix the issue. If that is true, no change in upstream mercurial would be required.Maybe someone familiar with the source can quickly add the parameter and check if this fixes the issue?
I don't believe the fix requires any upstream changes.
It looks to me like the reported timestamp differences reflect the time at which tar was invoked. I can reproduce the behavior with a single tar file extracted to two directories.
$ mkdir /tmp/first $ mkdir /tmp/second $ tar xzf 0.17.0.tar.gz -C /tmp/first $ tar xzf 0.17.0.tar.gz -C /tmp/second $ mtree -K sha256digest -cip /tmp/first > /tmp/first.txt $ mtree -K sha256digest -cip /tmp/second > /tmp/second.txt $ diff /tmp/first.txt /tmp/second.txt--- /tmp/first.txt Wed Dec 9 22:33:58 2020 +++ /tmp/second.txt Wed Dec 9 22:34:13 2020 @@ -1,14 +1,14 @@ -# tree: /tmp/first -# date: Wed Dec 9 22:33:58 2020 +# tree: /tmp/second +# date: Wed Dec 9 22:34:13 2020 # . /set type=file uid=1000 gid=0 mode=0755 nlink=1 -. type=dir nlink=3 time=1607571153.604606590 +. type=dir nlink=3 time=1607571182.794670190 # ./hg.sr.ht-0.17.0 - hg.sr.ht-0.17.0 type=dir nlink=5 time=1607571153.604606590 + hg.sr.ht-0.17.0 type=dir nlink=5 time=1607571182.794670190 .hg_archival.txt \ mode=0644 size=122 time=0.0 \ sha256digest=7323b7a7d3ba0752a3c4f74aad5c525640fc4910af99e30c62d1438899739035 @@ -54,7 +54,7 @@ # ./hg.sr.ht-0.17.0/.builds /set type=file uid=1000 gid=0 mode=0644 nlink=1 .builds type=dir mode=0755 nlink=2 \ - time=1607571153.574606440 + time=1607571182.764701552 alpine.yml size=1536 time=0.0 \ sha256digest=3ad1901a4de957607c1bc4415d9d1a03068104a24b059c596aad982aecfc50b9 archlinux.yml size=1142 time=0.0 \I think the issue is in how the archive is generated with a randomized name before being sent. The "created with" name is visible on each download (contained in the .gz file) and triggers the checksum mismatch:
$ file 0.17.0.tar.gz 0.17.0.tar.gz: gzip compressed data, was "0.17.0b'0c7421f703e0a468'.tar", last modified: Mon Nov 11 22:03:15 2019, max compressionI've submitted a patch that I think should fix this:
Fixed by Nolan in 701795ec3596.