"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: gotadmin cleanup
To:
Christian Weisgerber <naddy@mips.inka.de>
Cc:
gameoftrees@openbsd.org
Date:
Sun, 4 Jul 2021 20:09:51 +0200

Download raw body.

Thread
  • Christian Weisgerber:

    gotadmin cleanup

  • On Sun, Jul 04, 2021 at 07:14:43PM +0200, Christian Weisgerber wrote:
    > I guess in modern environments it's not a bottleneck per se, but
    > should the progress output be moderated?  Updating the count for
    > each commit produces a lot of spew:
    > 
    > $ script -c 'gotadmin cleanup -r ports.git'
    > Script started, output file is typescript
    > 103 loose objects; 158146 commits scanned; 20 objects purged
    > loose total size before: 455K
    > loose total size after: 219K
    > disk space freed: 237K
    > loose objects also found in pack files: 20
    > Script done, output file is typescript
    > $ ls -lh typescript
    > -rw-r--r--  1 naddy  naddy   6.2M Jul  4 14:40 typescript
    > 
    > Unfortunately, I'm not sure how to do this.  Any sort of skipping
    > updates requires the addition of a flush operation to make sure the
    > last update actually gets printed.
    
    Unfortunately we cannot know the number of commits which will be
    processed ahead of time. Otherwise we could simply show a percentage.
    
    Note that processing a commit does take some time. About as long as
    got tree -R initially, and then fractions of that depending on
    which subtrees were changed in a commit. Initially I had it print
    progress while traversing tree entries and that was really quite bad.
    
    In got-index-pack we only report pack file size changes in increments of
    1024 bytes. In that case progress report occurs via imsg and performance
    was quite bad without this optimization.
    
    > I came up with the patch below, but I don't like it at all.
    > 
    > Also, there are already a number of progress callbacks implemented,
    > so maybe a generic solution is needed?
    
    I guess if we're producing output too quickly we could consider showing
    a different set of information to the user. For example, instead of
    reporting the number of commit scanned we could show the number of
    referenced loose objects which have been found while scanning commmits.
    I chose to display the number of commits scanned because this allows the
    user to tell that we're making progress no matter what the result will be.
    
    Failing that, if you don't like your time-based rate limiting, we could
    rate-limit based on the number of times the callback is invoked.
    We could rate-limit calls to fflush(stdout) in the callback to, say,
    every 10 times it gets called (where 10 could be optimized according to
    benchmarks). And add a final fflush(stdout) call when the operation has
    completed. Something like this:
    
    diff 9614da0defceed78e0660d5e75d116b96feeccd6 /home/stsp/src/got
    blob - 3a8fe2211096a986460483094705912532891e8e
    file + gotadmin/gotadmin.c
    --- gotadmin/gotadmin.c
    +++ gotadmin/gotadmin.c
    @@ -909,6 +909,7 @@ struct got_cleanup_progress_arg {
     	int verbosity;
     	int printed_something;
     	int dry_run;
    +	int nonflushed;
     };
     
     static const struct got_error *
    @@ -954,7 +955,10 @@ cleanup_progress(void *arg, int nloose, int ncommits, 
     	}
     	if (print_loose || print_commits || print_purged) {
     		a->printed_something = 1;
    -		fflush(stdout);
    +		if (a->nonflushed++ >= 10) {
    +			fflush(stdout);
    +			a->nonflushed = 0;
    +		}
     	}
     	return NULL;
     }
    @@ -1061,6 +1065,7 @@ cmd_cleanup(int argc, char *argv[])
     		} else
     			printf("disk space freed: %s\n", scaled_diff);
     		printf("loose objects also found in pack files: %d\n", npacked);
    +		fflush(stdout);
     	}
     done:
     	if (repo)
    
    
    
  • Christian Weisgerber:

    gotadmin cleanup