"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: teach gotadmin cleanup to remove redundant packfiles
To:
Omar Polo <op@omarpolo.com>
Cc:
gameoftrees@openbsd.org
Date:
Sun, 18 Jun 2023 21:20:33 +0200

Download raw body.

Thread
On Sun, Jun 18, 2023 at 08:25:12PM +0200, Omar Polo wrote:
> > Should we lock the file before deleting it, like remove_packidx() which
> > is used by gotadmin cleanup -p?
> > In fact, should we call remove_packidx() here of unlinking .idx directly?
> > 
> > I see that git-repack doesn't seem to bother with such locking but it is
> > confusing to have an "unlink_pack" function which doesn't take a lock
> > and a "remove_packidx" function which takes a lock.
> 
> I'm still not sure about the locking needed here, in general.

I find this situation interesting because so far there has never
been a need to lock more than one file at a time to avoid races.
When I started learning about Git's internals I was surprised
about the apparent lack of a global write lock.

In SVN there is a global repository write lock which is grabbed
while committing a new transaction, changing revision properties,
or packing revision shards (unlike git's packs, this essentially
just writes the contents of 1000 revisions files into a pack file
to save inodes -- the individual files already contain deltified
data.). Any operation which changes the repository uses this lock.

The fact that git-gc keeps its own lock which seems to be ignored
by most (all?) other Git commands is curious. gc.pid is a kind
of global write lock but it's not the same lock as the locks used
by code which updates references. Does this really mean that
modifying refs while gc/cleanup are running is always safe?

> I'm think that we should just lock the whole `gotadmin cleanup'
> function, which would make locking before the removal of lonely idx
> unneeded and fix concurred redundant packfiles removal[*].
> 
> I'm not seeing why we should lock the lonely index before removing it,
> as we don't seem to check for that lock in other part of the codebase
> AFAICS.  haven't checked git yet.

I probably added this locking simply based on how refs get locked.
If Git never looks for an .idx.lock file then this locking is
indeed pointless.

> Diff below is what I think the locking should be like, but I know it's
> not ok since it includes got_lib_lockfile.h in gotadmin.  I'm open to
> suggestion for a got_repository_cleanup_lock() API should the lock
> scheme be ok.

Apart from the missing abstraction to avoid exposing got_lib_lockfile.h
I think this diff is ok.

You can add appropriate wrappers into lib/repository_admin.c and
call them from gotadmin.c. This has the advantage that other admin
interfaces can be written and take advantage of the same locking
without including private headers.

Regarding naming you could stick to a convention we use in some
places where there is a "prepare" and a "complete" function,
such as "got_repo_cleanup_prepare" and "got_repo_cleanup_complete".