From: Stefan Sperling Subject: Re: teach gotadmin cleanup to remove redundant packfiles To: Omar Polo Cc: gameoftrees@openbsd.org Date: Sun, 18 Jun 2023 21:20:33 +0200 On Sun, Jun 18, 2023 at 08:25:12PM +0200, Omar Polo wrote: > > Should we lock the file before deleting it, like remove_packidx() which > > is used by gotadmin cleanup -p? > > In fact, should we call remove_packidx() here of unlinking .idx directly? > > > > I see that git-repack doesn't seem to bother with such locking but it is > > confusing to have an "unlink_pack" function which doesn't take a lock > > and a "remove_packidx" function which takes a lock. > > I'm still not sure about the locking needed here, in general. I find this situation interesting because so far there has never been a need to lock more than one file at a time to avoid races. When I started learning about Git's internals I was surprised about the apparent lack of a global write lock. In SVN there is a global repository write lock which is grabbed while committing a new transaction, changing revision properties, or packing revision shards (unlike git's packs, this essentially just writes the contents of 1000 revisions files into a pack file to save inodes -- the individual files already contain deltified data.). Any operation which changes the repository uses this lock. The fact that git-gc keeps its own lock which seems to be ignored by most (all?) other Git commands is curious. gc.pid is a kind of global write lock but it's not the same lock as the locks used by code which updates references. Does this really mean that modifying refs while gc/cleanup are running is always safe? > I'm think that we should just lock the whole `gotadmin cleanup' > function, which would make locking before the removal of lonely idx > unneeded and fix concurred redundant packfiles removal[*]. > > I'm not seeing why we should lock the lonely index before removing it, > as we don't seem to check for that lock in other part of the codebase > AFAICS. haven't checked git yet. I probably added this locking simply based on how refs get locked. If Git never looks for an .idx.lock file then this locking is indeed pointless. > Diff below is what I think the locking should be like, but I know it's > not ok since it includes got_lib_lockfile.h in gotadmin. I'm open to > suggestion for a got_repository_cleanup_lock() API should the lock > scheme be ok. Apart from the missing abstraction to avoid exposing got_lib_lockfile.h I think this diff is ok. You can add appropriate wrappers into lib/repository_admin.c and call them from gotadmin.c. This has the advantage that other admin interfaces can be written and take advantage of the same locking without including private headers. Regarding naming you could stick to a convention we use in some places where there is a "prepare" and a "complete" function, such as "got_repo_cleanup_prepare" and "got_repo_cleanup_complete".