From: Stefan Sperling Subject: Re: gotadmin cleanup To: Christian Weisgerber Cc: gameoftrees@openbsd.org Date: Sun, 4 Jul 2021 20:09:51 +0200 On Sun, Jul 04, 2021 at 07:14:43PM +0200, Christian Weisgerber wrote: > I guess in modern environments it's not a bottleneck per se, but > should the progress output be moderated? Updating the count for > each commit produces a lot of spew: > > $ script -c 'gotadmin cleanup -r ports.git' > Script started, output file is typescript > 103 loose objects; 158146 commits scanned; 20 objects purged > loose total size before: 455K > loose total size after: 219K > disk space freed: 237K > loose objects also found in pack files: 20 > Script done, output file is typescript > $ ls -lh typescript > -rw-r--r-- 1 naddy naddy 6.2M Jul 4 14:40 typescript > > Unfortunately, I'm not sure how to do this. Any sort of skipping > updates requires the addition of a flush operation to make sure the > last update actually gets printed. Unfortunately we cannot know the number of commits which will be processed ahead of time. Otherwise we could simply show a percentage. Note that processing a commit does take some time. About as long as got tree -R initially, and then fractions of that depending on which subtrees were changed in a commit. Initially I had it print progress while traversing tree entries and that was really quite bad. In got-index-pack we only report pack file size changes in increments of 1024 bytes. In that case progress report occurs via imsg and performance was quite bad without this optimization. > I came up with the patch below, but I don't like it at all. > > Also, there are already a number of progress callbacks implemented, > so maybe a generic solution is needed? I guess if we're producing output too quickly we could consider showing a different set of information to the user. For example, instead of reporting the number of commit scanned we could show the number of referenced loose objects which have been found while scanning commmits. I chose to display the number of commits scanned because this allows the user to tell that we're making progress no matter what the result will be. Failing that, if you don't like your time-based rate limiting, we could rate-limit based on the number of times the callback is invoked. We could rate-limit calls to fflush(stdout) in the callback to, say, every 10 times it gets called (where 10 could be optimized according to benchmarks). And add a final fflush(stdout) call when the operation has completed. Something like this: diff 9614da0defceed78e0660d5e75d116b96feeccd6 /home/stsp/src/got blob - 3a8fe2211096a986460483094705912532891e8e file + gotadmin/gotadmin.c --- gotadmin/gotadmin.c +++ gotadmin/gotadmin.c @@ -909,6 +909,7 @@ struct got_cleanup_progress_arg { int verbosity; int printed_something; int dry_run; + int nonflushed; }; static const struct got_error * @@ -954,7 +955,10 @@ cleanup_progress(void *arg, int nloose, int ncommits, } if (print_loose || print_commits || print_purged) { a->printed_something = 1; - fflush(stdout); + if (a->nonflushed++ >= 10) { + fflush(stdout); + a->nonflushed = 0; + } } return NULL; } @@ -1061,6 +1065,7 @@ cmd_cleanup(int argc, char *argv[]) } else printf("disk space freed: %s\n", scaled_diff); printf("loose objects also found in pack files: %d\n", npacked); + fflush(stdout); } done: if (repo)