"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: add and use got_repo_get_object_format()
To:
Omar Polo <op@omarpolo.com>
Cc:
gameoftrees@openbsd.org
Date:
Mon, 27 Feb 2023 15:33:19 +0100

Download raw body.

Thread
On Mon, Feb 27, 2023 at 03:09:25PM +0100, Omar Polo wrote:
> This adds a new field to the struct got_repository and a getter for it
> so that we can avoid hardcoding GOT_HASH_SHA1 in a few places.  At the
> moment it is implicitly set to GOT_HASH_SHA1, but once we have more
> support for sha256 (the immediate targets are packfiles and loose
> objects) we can start honouring the `objectformat=sha256' and have all
> the code in repository.c / repository_admin.c / reference.c working
> regardless of the hash used.
> 
> There is a concern, which luckily doesn't have any consequence but it's
> still worth noticing: lib/repository.c:/^is_git_repo tries to resolve
> "HEAD" before the repo config is parsed.  This means that, say, on a
> sha256 repos we'd parse the object ids as sha1.  Since we do nothing
> with the resolved ref this doesn't cause issues in practice but is still
> weird.

That could easily be tweaked. Parsing HEAD is just one heuristic to find
out whether a directory might contain a repository. We might as well parse
the configuration file first, then parse HEAD.

> diff /home/op/w/got
> commit - 5c5396e5c1ee0b8ede1cce4c7262a946ab1b3970
> path + /home/op/w/got
> blob - 53e99801ab01ad8804d048c5f59a94930e5ef10a
> file + include/got_repository.h
> --- include/got_repository.h
> +++ include/got_repository.h
> @@ -35,6 +35,9 @@ int got_repo_get_fd(struct got_repository *);
>  /* Obtain the file descriptor of the repository's .git directory. */
>  int got_repo_get_fd(struct got_repository *);
>  
> +/* Obtain the object format */
> +enum got_hash_algorithm got_repo_get_object_format(struct got_repository *);
> +
>  /* Obtain the commit author name if parsed from gitconfig, else NULL. */
>  const char *got_repo_get_gitconfig_author_name(struct got_repository *);
>  
> blob - 83d718aa029bf9afded3debb15731d617de2169c
> file + lib/got_lib_repository.h
> --- lib/got_lib_repository.h
> +++ lib/got_lib_repository.h
> @@ -63,6 +63,7 @@ struct got_repository {
>  	char *path;
>  	char *path_git_dir;
>  	int gitdir_fd;
> +	enum got_hash_algorithm algo;

I wonder if this part will break if/when Git implements support for multiple
types of object ID hashes in a single repository. I suspect Git will keep
flagging the sha256 object format extension as "experimental" until Git
supports multiple types of hashes in the same repository. Until they move
their format out of "experimental" status we should be a bit careful in
our assumptions.

There is a possibility that Git will break support with the pure SHA256
repositories they have implemented now (though I hope not). In that case we
would probably keep our SHA256 support working as it is, if our support is
already in production use by then, and rely on SHA1-only and mixed-hash
repositories where interop with Git is required.

To make things easier for us in the future it might be a good idea to use
a bitmask of object formats present in the repository, rather than one value:

	uint32_t supported_hash_algorithms; /* (1 << got_hash_algorithm) */

In repositories created by Got, only one bit would be set. And in repositories
created by Git which supports multiple hashes multiple bits would be set.
We would need to refuse to work on mixed-hash repositories until we can write
to them safely, at least in a way that keeps Git happy, even if we don't expose
mixed-hash support ourselves.