"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

Omar Polo <op@omarpolo.com>
Re: add and use got_repo_get_object_format()
Stefan Sperling <stsp@stsp.name>
Mon, 27 Feb 2023 16:15:26 +0100

Download raw body.

On 2023/02/27 15:33:19 +0100, Stefan Sperling <stsp@stsp.name> wrote:
> On Mon, Feb 27, 2023 at 03:09:25PM +0100, Omar Polo wrote:
> > blob - 83d718aa029bf9afded3debb15731d617de2169c
> > file + lib/got_lib_repository.h
> > --- lib/got_lib_repository.h
> > +++ lib/got_lib_repository.h
> > @@ -63,6 +63,7 @@ struct got_repository {
> >  	char *path;
> >  	char *path_git_dir;
> >  	int gitdir_fd;
> > +	enum got_hash_algorithm algo;
> I wonder if this part will break if/when Git implements support for multiple
> types of object ID hashes in a single repository. I suspect Git will keep
> flagging the sha256 object format extension as "experimental" until Git
> supports multiple types of hashes in the same repository. Until they move
> their format out of "experimental" status we should be a bit careful in
> our assumptions.
> There is a possibility that Git will break support with the pure SHA256
> repositories they have implemented now (though I hope not). In that case we
> would probably keep our SHA256 support working as it is, if our support is
> already in production use by then, and rely on SHA1-only and mixed-hash
> repositories where interop with Git is required.

My understanding is that git explicitly rejected the idea of having
mixed-hash repositories.  In hash-function-transition.txt[0], under
the "Non-Goals" section there's explicitly listed "Intermixing objects
using multiple hash functions in a single repository."

(to be fair half of the Non-Goals items are, in fact, long term

[0]: <https://git-scm.com/docs/hash-function-transition>

The way git-config(1) describes extensions.objectFormat hints that
there's only one possible value (either sha1 or sha256) and that it
should only be set by git-init(1) or git-clone(1) otherwise "will
produce hard-to-diagnose issues."

Granted, I'm don't know for sure if git will follow this path in the
future or it will change plans.  The sha256 support is still labelled
as experimental, but it's already working and menitoned in different
places, so I'm at least inclined to assume that there won't be these
kind of major changes.  Just my opinion though, I don't know any of
the git developers and don't know what their plans is.

Do you have any idea of if and how they're gonna implement mixed hash

(Git documents a translation table to support a bidirectional mapping
between sha1 and sha256 object ids.  We may need to implement it to
have fetch and send working across different hashes (i.e. sha256
locally and sha1 remotely or vice-versa), or to allow specifying
object ids on the command line as sha1 on sha256 repos.)

> To make things easier for us in the future it might be a good idea to use
> a bitmask of object formats present in the repository, rather than one value:
> 	uint32_t supported_hash_algorithms; /* (1 << got_hash_algorithm) */

actually I've been thinking of defining the items of the
got_hash_algorithm enum as they were bitmasks, but i like this
approach of shifting by the enum value more.  It could turn out useful
for functions like got_hash_init() if we'll ever need to compute
multiple hashes in one go.

> In repositories created by Got, only one bit would be set. And in repositories
> created by Git which supports multiple hashes multiple bits would be set.
> We would need to refuse to work on mixed-hash repositories until we can write
> to them safely, at least in a way that keeps Git happy, even if we don't expose
> mixed-hash support ourselves.