From: Omar Polo Subject: Re: add and use got_repo_get_object_format() To: Stefan Sperling Cc: gameoftrees@openbsd.org Date: Mon, 27 Feb 2023 16:15:26 +0100 On 2023/02/27 15:33:19 +0100, Stefan Sperling wrote: > On Mon, Feb 27, 2023 at 03:09:25PM +0100, Omar Polo wrote: > > blob - 83d718aa029bf9afded3debb15731d617de2169c > > file + lib/got_lib_repository.h > > --- lib/got_lib_repository.h > > +++ lib/got_lib_repository.h > > @@ -63,6 +63,7 @@ struct got_repository { > > char *path; > > char *path_git_dir; > > int gitdir_fd; > > + enum got_hash_algorithm algo; > > I wonder if this part will break if/when Git implements support for multiple > types of object ID hashes in a single repository. I suspect Git will keep > flagging the sha256 object format extension as "experimental" until Git > supports multiple types of hashes in the same repository. Until they move > their format out of "experimental" status we should be a bit careful in > our assumptions. > > There is a possibility that Git will break support with the pure SHA256 > repositories they have implemented now (though I hope not). In that case we > would probably keep our SHA256 support working as it is, if our support is > already in production use by then, and rely on SHA1-only and mixed-hash > repositories where interop with Git is required. My understanding is that git explicitly rejected the idea of having mixed-hash repositories. In hash-function-transition.txt[0], under the "Non-Goals" section there's explicitly listed "Intermixing objects using multiple hash functions in a single repository." (to be fair half of the Non-Goals items are, in fact, long term goals.) [0]: The way git-config(1) describes extensions.objectFormat hints that there's only one possible value (either sha1 or sha256) and that it should only be set by git-init(1) or git-clone(1) otherwise "will produce hard-to-diagnose issues." Granted, I'm don't know for sure if git will follow this path in the future or it will change plans. The sha256 support is still labelled as experimental, but it's already working and menitoned in different places, so I'm at least inclined to assume that there won't be these kind of major changes. Just my opinion though, I don't know any of the git developers and don't know what their plans is. Do you have any idea of if and how they're gonna implement mixed hash repositories? (Git documents a translation table to support a bidirectional mapping between sha1 and sha256 object ids. We may need to implement it to have fetch and send working across different hashes (i.e. sha256 locally and sha1 remotely or vice-versa), or to allow specifying object ids on the command line as sha1 on sha256 repos.) > To make things easier for us in the future it might be a good idea to use > a bitmask of object formats present in the repository, rather than one value: > > uint32_t supported_hash_algorithms; /* (1 << got_hash_algorithm) */ actually I've been thinking of defining the items of the got_hash_algorithm enum as they were bitmasks, but i like this approach of shifting by the enum value more. It could turn out useful for functions like got_hash_init() if we'll ever need to compute multiple hashes in one go. > In repositories created by Got, only one bit would be set. And in repositories > created by Git which supports multiple hashes multiple bits would be set. > We would need to refuse to work on mixed-hash repositories until we can write > to them safely, at least in a way that keeps Git happy, even if we don't expose > mixed-hash support ourselves.