~sircmpwn/git.sr.ht#279: 
repo:post-update webhook is not able to handle large change

Error displayed:

    remote: Resolving deltas: 100% (147/147), done.
    remote: Failed to execute stage 3: argument list too long

Repro steps:

1. create new empty repository
2. create single commit on master (whatever content)
3. for i in `seq 1 150`; do g co -b b$i; touch bar$i; g a bar$i; g c -m "add bar$i"; g co master; done
4. for i in `seq 1 150`; do g co b$i; g tag t_b$i; g co master; done
5. git push --mirror origin
Status
RESOLVED FIXED
Submitter
~graywolf
Assigned to
No-one
Submitted
4 months ago
Updated
a month ago
Labels
No labels applied.

~nabijaczleweli a month ago

My initial thought after being unable to reproduce was that this was something ARG_MAX related, so I ran prepped this instead:

for i in $(seq 1 15000); do git checkout -b b$i; touch bar$i; git add bar$i; git commit -m "add bar$i"; git checkout master; done
for i in $(seq 1 15000); do git tag -m "$i" t_b$i refs/heads/b$i; done

for 30`000 pushed refs, and the final line in /var/log/gitsrht-update-hook, starting with "hooks/post-update 2020/08/04 20:06:39 [hooks/post-update refs/heads/b1", was 536`487 bytes long, so, even on my Alpine dev VM with its relatively tiny ARG_MAX of 131072 (for comparison, Debians set it to 2M) I couldn't reproduce this.

The line is gonna be (len("refs/tags/")+tag_name_len)*N_tags+(len("refs/heads/")+branch_name_len)*N_branches+C bytes long, which, assuming long tags, very long branches and large overhead is (10+10)*N_tags+(11+19)*N_branches+15 or 20N_refs+15, so to run into the ARG_MAX limit you'd need 20N_refs+15 > 131072, or N_refs > 6552 in a single push, which isn't gonna cause problems for any Linux distribution. POSIX defines this as only 4k at minimum, but going by https://www.in-ulm.de/~mascheck/various/argmax/#results all common Berkeley distributions go well above that.

Beyond that, I don't know what could possibly cause this error when exec()ing the hook.

I also pushed my horrible repo to git.sr.ht proper (sorry, Drew!) and it worked as well.

~graywolf a month ago

Hi,

On 2020-08-04 19:00:06 -0000, ~nabijaczleweli wrote:

My initial thought after being unable to reproduce was that this was something ARG_MAX related, so I ran prepped this instead:

for i

in $(seq 1 15000); do git checkout -b b$i; touch bar$i; git add bar$i; git commit -m "add bar$i"; git checkout master; done for i in $(seq 1 15000); do git tag -m "$i" t_b$i refs/heads/b$i; done

for 30`000 pushed refs, and the final line in /var/log/gitsrht-update-hook, starting with "hooks/post-update 2020/08/04 20:06:39 [hooks/post-update refs/heads/b1", was 536`487 bytes long, so, even on my Alpine dev VM with its relatively tiny ARG_MAX of 131072 (for comparison, Debians set it to 2M) I couldn't reproduce this.

The line is gonna be (len(" refs/tags/")+tag_name_len)*N_tags+(len("refs/heads/")+branch_name_len)*N _branches+C bytes long, which, assuming long tags, very long branches and large overhead is (10+10)*N_tags+(11+19)*N_branches+15 or 20N_refs+15, so to run into the ARG_MAX limit you'd need 20N_refs+15 > 131072, or N_refs > 6552 in a single push, which isn't gonna cause problems for any Linux distribution. POSIX defines this as only 4k at minimum, but going by https://www.in- ulm.de/~mascheck/various/argmax/#results all common Berkeley distributions go well above that.

Beyond that, I don't know what could possibly cause this error when exec()ing the hook.

I also pushed my horrible repo to git.sr.ht proper (sorry, Drew!) and it worked as well.

Thank you for trying to reproduce it and I'm sorry it did not work. I've just tried again, and it failed again. Here are snippets from my try:

+$ git clone git@git.sr.ht:~graywolf/repro-279 Cloning into 'repro-279'... warning: You appear to have cloned an empty repository. +$ cd repro-279 +$ touch foo && git add foo && git commit -m "add foo" [master (root-commit) 1ed2511] add foo 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 foo +$ for i in seq 1 150; do g co -b b$i; touch bar$i; g a bar$i; g c -m "add bar$i"; g co master; done Switched to a new branch 'b1' [b1 8d8dba6] add bar1 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 bar1 Switched to branch 'master' Your branch is based on 'origin/master', but the upstream is gone. (use "git branch --unset-upstream" to fixup) . . . Switched to a new branch 'b150' [b150 6014a39] add bar150 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 bar150 Switched to branch 'master' Your branch is based on 'origin/master', but the upstream is gone. (use "git branch --unset-upstream" to fixup) +$ for i in seq 1 150; do g co b$i; g tag t_b$i; g co master; done Switched to branch 'b1' Switched to branch 'master' Your branch is based on 'origin/master', but the upstream is gone. (use "git branch --unset-upstream" to fixup) . . . Switched to branch 'b150' Switched to branch 'master' Your branch is based on 'origin/master', but the upstream is gone. (use "git branch --unset-upstream" to fixup) +$ git push --mirror origin Enumerating objects: 303, done. Counting objects: 100% (303/303), done. Delta compression using up to 4 threads Compressing objects: 100% (301/301), done. Writing objects: 100% (303/303), 126.46 KiB | 3.61 MiB/s, done. Total 303 (delta 0), reused 0 (delta 0), pack-reused 0 remote: Failed to execute stage 3: argument list too long To git.sr.ht:~graywolf/repro-279

  • [new branch] b1 -> b1

. . .

  • [new branch] b99 -> b99
  • [new branch] master -> master
  • [new tag] t_b1 -> t_b1

. . .

  • [new tag] t_b99 -> t_b99

So as you can see, the line

remote: Failed to execute stage 3: argument list too long

is still there. I will leave the repository up this time, it is located at

https://git.sr.ht/~graywolf/repro-279



W.

-- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors.

~nabijaczleweli a month ago

Oh, cool, I managed to repro this thanks to your new comment (though, admittedly, slightly differently).

pu.c, build with cc pu.c -O3 -opu:

int main(int argc, char ** argv, char ** envp) {
	unsigned bc = 0;
	for(; *argv; ++argv)
		bc += strlen(*argv);
	for(; *envp; ++envp)
		bc += strlen(*envp);
	printf("pu: %u\n", bc);
}

An additional test program, al.c:

#include <errno.h>
int main() {
	char * arg = calloc(10 * 1024 * 1024, 1);

	for(;;) {
		printf("a : %d\n", strlen(arg));
		if(!fork())
			if(execlp("./pu", "pu", arg, 0) < 0)
				printf("e : %d\n", errno);
		if(wait(0) < 0) {
			printf("ew: %d\n", errno);
			_exit(0);
		}
		strcat(arg, "a");
	}
}

Play around correlating ulimit -s with getconf ARG_MAX, setting the stack to 512 dropped the max down to 131072 (from the 8192/2097152 it is by default on my Buster). Then, running ./al > output and looking whereabout the pu: lines stop and an e:/ew: line appears is a good tall water line.

Now, git init --bare bare.git and cp -p pu bare.git/hooks/post-update, and push a horrible repository there. For me, the 1500-iter proceeded as usual and printed remote: pu: <number>. However, when pushing a 15000-iter repo, I got

fatal: cannot exec 'hooks/post-update': Argument list too long
To /home/nabijaczleweli/uwu/bare.git/
 * [new branch]        b1 -> b1

This is inherently unavoidable with the design of git (and, well, we had to try to fuck it up, on a sub-default ulimit; it'll be fine).

However, your git output has remote: Failed to execute stage 3: argument list too long, which corresponds to this line in the post-update hook — of note are the two JSON bundles, at least one of which has one entry per ref, very quickly ballooning it past ARG_MAX, and causing the error you saw.

A patch to address this by using something different (at least as a fallback) shouldn't be too hard. ~sircmpwn, do you have a preference of tempfiles/pipes/some other IPC/inheritance mechanism here?

~sircmpwn a month ago

We could write the blob to stdin, perhaps.

~sircmpwn REPORTED FIXED a month ago

Register here or Log in to comment, or comment via email.