"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

Stefan Sperling <stsp@stsp.name>
Re: add and use got_repo_get_object_format()
Omar Polo <op@omarpolo.com>
Mon, 27 Feb 2023 16:43:59 +0100

Download raw body.

On Mon, Feb 27, 2023 at 04:15:26PM +0100, Omar Polo wrote:
> My understanding is that git explicitly rejected the idea of having
> mixed-hash repositories.  In hash-function-transition.txt[0], under
> the "Non-Goals" section there's explicitly listed "Intermixing objects
> using multiple hash functions in a single repository."
> (to be fair half of the Non-Goals items are, in fact, long term
> goals.)
> [0]: <https://git-scm.com/docs/hash-function-transition>
> The way git-config(1) describes extensions.objectFormat hints that
> there's only one possible value (either sha1 or sha256) and that it
> should only be set by git-init(1) or git-clone(1) otherwise "will
> produce hard-to-diagnose issues."

I see. What I meant is really the translation table you mention below.
I thought this might cause multiple hash types to be listed as extensions,
but you are showing that this is not in fact the case (unless Git devs
change their mind on this in the future, but as you say this is not likely).
> (Git documents a translation table to support a bidirectional mapping
> between sha1 and sha256 object ids.  We may need to implement it to
> have fetch and send working across different hashes (i.e. sha256
> locally and sha1 remotely or vice-versa), or to allow specifying
> object ids on the command line as sha1 on sha256 repos.)

Ok, so we can probably assume that only one hash will ever be used at
a time, as far as object IDs stored outside this mythical translation
table are concerned. That is a relief :)

I suppose your patch is fine as it is then.

> > To make things easier for us in the future it might be a good idea to use
> > a bitmask of object formats present in the repository, rather than one value:
> > 
> > 	uint32_t supported_hash_algorithms; /* (1 << got_hash_algorithm) */
> actually I've been thinking of defining the items of the
> got_hash_algorithm enum as they were bitmasks, but i like this
> approach of shifting by the enum value more.  It could turn out useful
> for functions like got_hash_init() if we'll ever need to compute
> multiple hashes in one go.

Feel free to use this idea in other contexts where it makes sense.